linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/11] KVM: x86: track guest page access
@ 2016-02-14 11:31 Xiao Guangrong
  2016-02-14 11:31 ` [PATCH v3 01/11] KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed Xiao Guangrong
                   ` (11 more replies)
  0 siblings, 12 replies; 39+ messages in thread
From: Xiao Guangrong @ 2016-02-14 11:31 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song, Xiao Guangrong

Changelong in v3:
- refine the code of mmu_need_write_protect() based on Huang Kai's suggestion
- rebase the patchset against current code

Changelog in v2:
- fix a issue that the track memory of memslot is freed if we only move
  the memslot or change the flags of memslot
- do not track the gfn which is not mapped in memslots
- introduce the nolock APIs at the begin of the patchset
- use 'unsigned short' as the track counter to reduce the memory and which
  should be enough for shadow page table and KVMGT

This patchset introduces the feature which allows us to track page
access in guest. Currently, only write access tracking is implemented
in this version.

Four APIs are introduces:
- kvm_page_track_add_page(kvm, gfn, mode), single guest page @gfn is
  added into the track pool of the guest instance represented by @kvm,
  @mode specifies which kind of access on the @gfn is tracked
  
- kvm_page_track_remove_page(kvm, gfn, mode), is the opposed operation
  of kvm_page_track_add_page() which removes @gfn from the tracking pool.
  gfn is no tracked after its last user is gone

- kvm_page_track_register_notifier(kvm, n), register a notifier so that
  the event triggered by page tracking will be received, at that time,
  the callback of n->track_write() will be called

- kvm_page_track_unregister_notifier(kvm, n), does the opposed operation
  of kvm_page_track_register_notifier(), which unlinks the notifier and
  stops receiving the tracked event

The first user of page track is non-leaf shadow page tables as they are
always write protected. It also gains performance improvement because
page track speeds up page fault handler for the tracked pages. The
performance result of kernel building is as followings:

   before           after
real 461.63       real 455.48
user 4529.55      user 4557.88
sys 1995.39       sys 1922.57

Furthermore, it is the infrastructure of other kind of shadow page table,
such as GPU shadow page table introduced in KVMGT (1) and native nested
IOMMU.

This patch can be divided into two parts:
- patch 1 ~ patch 7, implement page tracking
- others patches apply page tracking to non-leaf shadow page table

(1): http://lkml.iu.edu/hypermail/linux/kernel/1510.3/01562.html

Xiao Guangrong (11):
  KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed
  KVM: MMU: introduce kvm_mmu_gfn_{allow,disallow}_lpage
  KVM: MMU: introduce kvm_mmu_slot_gfn_write_protect
  KVM: page track: add the framework of guest page tracking
  KVM: page track: introduce kvm_page_track_{add,remove}_page
  KVM: MMU: let page fault handler be aware tracked page
  KVM: page track: add notifier support
  KVM: MMU: use page track for non-leaf shadow pages
  KVM: MMU: simplify mmu_need_write_protect
  KVM: MMU: clear write-flooding on the fast path of tracked page
  KVM: MMU: apply page track notifier

 Documentation/virtual/kvm/mmu.txt     |   6 +-
 arch/x86/include/asm/kvm_host.h       |  12 +-
 arch/x86/include/asm/kvm_page_track.h |  67 +++++++++
 arch/x86/kvm/Makefile                 |   3 +-
 arch/x86/kvm/mmu.c                    | 209 ++++++++++++++++++---------
 arch/x86/kvm/mmu.h                    |   5 +
 arch/x86/kvm/page_track.c             | 257 ++++++++++++++++++++++++++++++++++
 arch/x86/kvm/paging_tmpl.h            |   5 +
 arch/x86/kvm/x86.c                    |  27 ++--
 9 files changed, 512 insertions(+), 79 deletions(-)
 create mode 100644 arch/x86/include/asm/kvm_page_track.h
 create mode 100644 arch/x86/kvm/page_track.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 01/11] KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed
  2016-02-14 11:31 [PATCH v3 00/11] KVM: x86: track guest page access Xiao Guangrong
@ 2016-02-14 11:31 ` Xiao Guangrong
  2016-02-19 11:08   ` Paolo Bonzini
  2016-02-14 11:31 ` [PATCH v3 02/11] KVM: MMU: introduce kvm_mmu_gfn_{allow,disallow}_lpage Xiao Guangrong
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Xiao Guangrong @ 2016-02-14 11:31 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song, Xiao Guangrong

kvm_lpage_info->write_count is used to detect if the large page mapping
for the gfn on the specified level is allowed, rename it to disallow_lpage
to reflect its purpose, also we rename has_wrprotected_page() to
mmu_gfn_lpage_is_disallowed() to make the code more clearer

Later we will extend this mechanism for page tracking: if the gfn is
tracked then large mapping for that gfn on any level is not allowed.
The new name is more straightforward

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 Documentation/virtual/kvm/mmu.txt |  6 +++---
 arch/x86/include/asm/kvm_host.h   |  2 +-
 arch/x86/kvm/mmu.c                | 25 +++++++++++++------------
 arch/x86/kvm/x86.c                | 14 ++++++++------
 4 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
index daf9c0f..dda2e93 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -391,11 +391,11 @@ To instantiate a large spte, four constraints must be satisfied:
   write-protected pages
 - the guest page must be wholly contained by a single memory slot
 
-To check the last two conditions, the mmu maintains a ->write_count set of
+To check the last two conditions, the mmu maintains a ->disallow_lpage set of
 arrays for each memory slot and large page size.  Every write protected page
-causes its write_count to be incremented, thus preventing instantiation of
+causes its disallow_lpage to be incremented, thus preventing instantiation of
 a large spte.  The frames at the end of an unaligned memory slot have
-artificially inflated ->write_counts so they can never be instantiated.
+artificially inflated ->disallow_lpages so they can never be instantiated.
 
 Zapping all pages (page generation count)
 =========================================
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7dd6d55..e1c1f57 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -644,7 +644,7 @@ struct kvm_vcpu_arch {
 };
 
 struct kvm_lpage_info {
-	int write_count;
+	int disallow_lpage;
 };
 
 struct kvm_arch_memory_slot {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 95a955d..de9e992 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -789,7 +789,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	slot = __gfn_to_memslot(slots, gfn);
 	for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
 		linfo = lpage_info_slot(gfn, slot, i);
-		linfo->write_count += 1;
+		linfo->disallow_lpage += 1;
 	}
 	kvm->arch.indirect_shadow_pages++;
 }
@@ -807,31 +807,32 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	slot = __gfn_to_memslot(slots, gfn);
 	for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
 		linfo = lpage_info_slot(gfn, slot, i);
-		linfo->write_count -= 1;
-		WARN_ON(linfo->write_count < 0);
+		linfo->disallow_lpage -= 1;
+		WARN_ON(linfo->disallow_lpage < 0);
 	}
 	kvm->arch.indirect_shadow_pages--;
 }
 
-static int __has_wrprotected_page(gfn_t gfn, int level,
-				  struct kvm_memory_slot *slot)
+static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
+					  struct kvm_memory_slot *slot)
 {
 	struct kvm_lpage_info *linfo;
 
 	if (slot) {
 		linfo = lpage_info_slot(gfn, slot, level);
-		return linfo->write_count;
+		return !!linfo->disallow_lpage;
 	}
 
-	return 1;
+	return true;
 }
 
-static int has_wrprotected_page(struct kvm_vcpu *vcpu, gfn_t gfn, int level)
+static bool mmu_gfn_lpage_is_disallowed(struct kvm_vcpu *vcpu, gfn_t gfn,
+					int level)
 {
 	struct kvm_memory_slot *slot;
 
 	slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
-	return __has_wrprotected_page(gfn, level, slot);
+	return __mmu_gfn_lpage_is_disallowed(gfn, level, slot);
 }
 
 static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
@@ -897,7 +898,7 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn,
 	max_level = min(kvm_x86_ops->get_lpage_level(), host_level);
 
 	for (level = PT_DIRECTORY_LEVEL; level <= max_level; ++level)
-		if (__has_wrprotected_page(large_gfn, level, slot))
+		if (__mmu_gfn_lpage_is_disallowed(large_gfn, level, slot))
 			break;
 
 	return level - 1;
@@ -2503,7 +2504,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		 * be fixed if guest refault.
 		 */
 		if (level > PT_PAGE_TABLE_LEVEL &&
-		    has_wrprotected_page(vcpu, gfn, level))
+		    mmu_gfn_lpage_is_disallowed(vcpu, gfn, level))
 			goto done;
 
 		spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;
@@ -2768,7 +2769,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
 	if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) &&
 	    level == PT_PAGE_TABLE_LEVEL &&
 	    PageTransCompound(pfn_to_page(pfn)) &&
-	    !has_wrprotected_page(vcpu, gfn, PT_DIRECTORY_LEVEL)) {
+	    !mmu_gfn_lpage_is_disallowed(vcpu, gfn, PT_DIRECTORY_LEVEL)) {
 		unsigned long mask;
 		/*
 		 * mmu_notifier_retry was successful and we hold the
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cf15bc5..f448e64 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7903,6 +7903,7 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
 	int i;
 
 	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
+		struct kvm_lpage_info *linfo;
 		unsigned long ugfn;
 		int lpages;
 		int level = i + 1;
@@ -7917,15 +7918,16 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
 		if (i == 0)
 			continue;
 
-		slot->arch.lpage_info[i - 1] = kvm_kvzalloc(lpages *
-					sizeof(*slot->arch.lpage_info[i - 1]));
-		if (!slot->arch.lpage_info[i - 1])
+		linfo = kvm_kvzalloc(lpages * sizeof(*linfo));
+		if (!linfo)
 			goto out_free;
 
+		slot->arch.lpage_info[i - 1] = linfo;
+
 		if (slot->base_gfn & (KVM_PAGES_PER_HPAGE(level) - 1))
-			slot->arch.lpage_info[i - 1][0].write_count = 1;
+			linfo[0].disallow_lpage = 1;
 		if ((slot->base_gfn + npages) & (KVM_PAGES_PER_HPAGE(level) - 1))
-			slot->arch.lpage_info[i - 1][lpages - 1].write_count = 1;
+			linfo[lpages - 1].disallow_lpage = 1;
 		ugfn = slot->userspace_addr >> PAGE_SHIFT;
 		/*
 		 * If the gfn and userspace address are not aligned wrt each
@@ -7937,7 +7939,7 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
 			unsigned long j;
 
 			for (j = 0; j < lpages; ++j)
-				slot->arch.lpage_info[i - 1][j].write_count = 1;
+				linfo[j].disallow_lpage = 1;
 		}
 	}
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 02/11] KVM: MMU: introduce kvm_mmu_gfn_{allow,disallow}_lpage
  2016-02-14 11:31 [PATCH v3 00/11] KVM: x86: track guest page access Xiao Guangrong
  2016-02-14 11:31 ` [PATCH v3 01/11] KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed Xiao Guangrong
@ 2016-02-14 11:31 ` Xiao Guangrong
  2016-02-19 11:09   ` Paolo Bonzini
  2016-02-14 11:31 ` [PATCH v3 03/11] KVM: MMU: introduce kvm_mmu_slot_gfn_write_protect Xiao Guangrong
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Xiao Guangrong @ 2016-02-14 11:31 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song, Xiao Guangrong

Abstract the common operations from account_shadowed() and
unaccount_shadowed(), then introduce kvm_mmu_gfn_disallow_lpage()
and kvm_mmu_gfn_allow_lpage()

These two functions will be used by page tracking in the later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 38 +++++++++++++++++++++++++-------------
 arch/x86/kvm/mmu.h |  3 +++
 2 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index de9e992..e1bb66c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -776,21 +776,39 @@ static struct kvm_lpage_info *lpage_info_slot(gfn_t gfn,
 	return &slot->arch.lpage_info[level - 2][idx];
 }
 
+static void update_gfn_disallow_lpage_count(struct kvm_memory_slot *slot,
+					    gfn_t gfn, int count)
+{
+	struct kvm_lpage_info *linfo;
+	int i;
+
+	for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
+		linfo = lpage_info_slot(gfn, slot, i);
+		linfo->disallow_lpage += count;
+		WARN_ON(linfo->disallow_lpage < 0);
+	}
+}
+
+void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn)
+{
+	update_gfn_disallow_lpage_count(slot, gfn, 1);
+}
+
+void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn)
+{
+	update_gfn_disallow_lpage_count(slot, gfn, -1);
+}
+
 static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *slot;
-	struct kvm_lpage_info *linfo;
 	gfn_t gfn;
-	int i;
 
 	gfn = sp->gfn;
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
-	for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
-		linfo = lpage_info_slot(gfn, slot, i);
-		linfo->disallow_lpage += 1;
-	}
+	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 	kvm->arch.indirect_shadow_pages++;
 }
 
@@ -798,18 +816,12 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *slot;
-	struct kvm_lpage_info *linfo;
 	gfn_t gfn;
-	int i;
 
 	gfn = sp->gfn;
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
-	for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
-		linfo = lpage_info_slot(gfn, slot, i);
-		linfo->disallow_lpage -= 1;
-		WARN_ON(linfo->disallow_lpage < 0);
-	}
+	kvm_mmu_gfn_allow_lpage(slot, gfn);
 	kvm->arch.indirect_shadow_pages--;
 }
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 55ffb7b..de92bed 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -174,4 +174,7 @@ static inline bool permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 
 void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm);
 void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
+
+void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
+void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 03/11] KVM: MMU: introduce kvm_mmu_slot_gfn_write_protect
  2016-02-14 11:31 [PATCH v3 00/11] KVM: x86: track guest page access Xiao Guangrong
  2016-02-14 11:31 ` [PATCH v3 01/11] KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed Xiao Guangrong
  2016-02-14 11:31 ` [PATCH v3 02/11] KVM: MMU: introduce kvm_mmu_gfn_{allow,disallow}_lpage Xiao Guangrong
@ 2016-02-14 11:31 ` Xiao Guangrong
  2016-02-19 11:18   ` Paolo Bonzini
  2016-02-14 11:31 ` [PATCH v3 04/11] KVM: page track: add the framework of guest page tracking Xiao Guangrong
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Xiao Guangrong @ 2016-02-14 11:31 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song, Xiao Guangrong

Split rmap_write_protect() and introduce the function to abstract the write
protection based on the slot

This function will be used in the later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 16 +++++++++++-----
 arch/x86/kvm/mmu.h |  2 ++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e1bb66c..edad3c7 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1336,23 +1336,29 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 		kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
-static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn)
+bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
+				    struct kvm_memory_slot *slot, u64 gfn)
 {
-	struct kvm_memory_slot *slot;
 	struct kvm_rmap_head *rmap_head;
 	int i;
 	bool write_protected = false;
 
-	slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
-
 	for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
 		rmap_head = __gfn_to_rmap(gfn, i, slot);
-		write_protected |= __rmap_write_protect(vcpu->kvm, rmap_head, true);
+		write_protected |= __rmap_write_protect(kvm, rmap_head, true);
 	}
 
 	return write_protected;
 }
 
+static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn)
+{
+	struct kvm_memory_slot *slot;
+
+	slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
+	return kvm_mmu_slot_gfn_write_protect(vcpu->kvm, slot, gfn);
+}
+
 static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
 {
 	u64 *sptep;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index de92bed..58fe98a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -177,4 +177,6 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
 
 void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
 void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
+bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
+				    struct kvm_memory_slot *slot, u64 gfn);
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 04/11] KVM: page track: add the framework of guest page tracking
  2016-02-14 11:31 [PATCH v3 00/11] KVM: x86: track guest page access Xiao Guangrong
                   ` (2 preceding siblings ...)
  2016-02-14 11:31 ` [PATCH v3 03/11] KVM: MMU: introduce kvm_mmu_slot_gfn_write_protect Xiao Guangrong
@ 2016-02-14 11:31 ` Xiao Guangrong
  2016-02-19 11:24   ` Paolo Bonzini
  2016-02-14 11:31 ` [PATCH v3 05/11] KVM: page track: introduce kvm_page_track_{add,remove}_page Xiao Guangrong
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Xiao Guangrong @ 2016-02-14 11:31 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song, Xiao Guangrong

The array, gfn_track[mode][gfn], is introduced in memory slot for every
guest page, this is the tracking count for the gust page on different
modes. If the page is tracked then the count is increased, the page is
not tracked after the count reaches zero

We use 'unsigned short' as the tracking count which should be enough as
shadow page table only can use 2^14 (2^3 for level, 2^1 for cr4_pae, 2^2
for quadrant, 2^3 for access, 2^1 for nxe, 2^1 for cr0_wp, 2^1 for
smep_andnot_wp, 2^1 for smap_andnot_wp, and 2^1 for smm) at most, there
is enough room for other trackers

Two callbacks, kvm_page_track_create_memslot() and
kvm_page_track_free_memslot() are implemented in this patch, they are
internally used to initialize and reclaim the memory of the array

Currently, only write track mode is supported

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h       |  2 ++
 arch/x86/include/asm/kvm_page_track.h | 13 +++++++++
 arch/x86/kvm/Makefile                 |  3 +-
 arch/x86/kvm/page_track.c             | 52 +++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c                    |  5 ++++
 5 files changed, 74 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/include/asm/kvm_page_track.h
 create mode 100644 arch/x86/kvm/page_track.c

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e1c1f57..d8931d0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -32,6 +32,7 @@
 #include <asm/mtrr.h>
 #include <asm/msr-index.h>
 #include <asm/asm.h>
+#include <asm/kvm_page_track.h>
 
 #define KVM_MAX_VCPUS 255
 #define KVM_SOFT_MAX_VCPUS 160
@@ -650,6 +651,7 @@ struct kvm_lpage_info {
 struct kvm_arch_memory_slot {
 	struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES];
 	struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
+	unsigned short *gfn_track[KVM_PAGE_TRACK_MAX];
 };
 
 /*
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
new file mode 100644
index 0000000..55200406
--- /dev/null
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -0,0 +1,13 @@
+#ifndef _ASM_X86_KVM_PAGE_TRACK_H
+#define _ASM_X86_KVM_PAGE_TRACK_H
+
+enum kvm_page_track_mode {
+	KVM_PAGE_TRACK_WRITE,
+	KVM_PAGE_TRACK_MAX,
+};
+
+void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
+				 struct kvm_memory_slot *dont);
+int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
+				  unsigned long npages);
+#endif
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index a1ff508..464fa47 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -13,9 +13,10 @@ kvm-$(CONFIG_KVM_ASYNC_PF)	+= $(KVM)/async_pf.o
 
 kvm-y			+= x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
 			   i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
-			   hyperv.o
+			   hyperv.o page_track.o
 
 kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)	+= assigned-dev.o iommu.o
+
 kvm-intel-y		+= vmx.o pmu_intel.o
 kvm-amd-y		+= svm.o pmu_amd.o
 
diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
new file mode 100644
index 0000000..8c396d0
--- /dev/null
+++ b/arch/x86/kvm/page_track.c
@@ -0,0 +1,52 @@
+/*
+ * Support KVM gust page tracking
+ *
+ * This feature allows us to track page access in guest. Currently, only
+ * write access is tracked.
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *   Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_host.h>
+#include <asm/kvm_page_track.h>
+
+#include "mmu.h"
+
+void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
+				 struct kvm_memory_slot *dont)
+{
+	int i;
+
+	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++)
+		if (!dont || free->arch.gfn_track[i] !=
+		      dont->arch.gfn_track[i]) {
+			kvfree(free->arch.gfn_track[i]);
+			free->arch.gfn_track[i] = NULL;
+		}
+}
+
+int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
+				  unsigned long npages)
+{
+	int  i;
+
+	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
+		slot->arch.gfn_track[i] = kvm_kvzalloc(npages *
+					    sizeof(*slot->arch.gfn_track[i]));
+		if (!slot->arch.gfn_track[i])
+			goto track_free;
+	}
+
+	return 0;
+
+track_free:
+	kvm_page_track_free_memslot(slot, NULL);
+	return -ENOMEM;
+}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f448e64..e25ebb7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7895,6 +7895,8 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
 			free->arch.lpage_info[i - 1] = NULL;
 		}
 	}
+
+	kvm_page_track_free_memslot(free, dont);
 }
 
 int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
@@ -7943,6 +7945,9 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
 		}
 	}
 
+	if (kvm_page_track_create_memslot(slot, npages))
+		goto out_free;
+
 	return 0;
 
 out_free:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 05/11] KVM: page track: introduce kvm_page_track_{add,remove}_page
  2016-02-14 11:31 [PATCH v3 00/11] KVM: x86: track guest page access Xiao Guangrong
                   ` (3 preceding siblings ...)
  2016-02-14 11:31 ` [PATCH v3 04/11] KVM: page track: add the framework of guest page tracking Xiao Guangrong
@ 2016-02-14 11:31 ` Xiao Guangrong
  2016-02-19 11:37   ` Paolo Bonzini
  2016-02-19 11:37   ` Paolo Bonzini
  2016-02-14 11:31 ` [PATCH v3 06/11] KVM: MMU: let page fault handler be aware tracked page Xiao Guangrong
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 39+ messages in thread
From: Xiao Guangrong @ 2016-02-14 11:31 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song, Xiao Guangrong

These two functions are the user APIs:
- kvm_page_track_add_page(): add the page to the tracking pool after
  that later specified access on that page will be tracked

- kvm_page_track_remove_page(): remove the page from the tracking pool,
  the specified access on the page is not tracked after the last user is
  gone

Both of these are called under the protection of kvm->srcu or
kvm->slots_lock

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/include/asm/kvm_page_track.h |  13 ++++
 arch/x86/kvm/page_track.c             | 124 ++++++++++++++++++++++++++++++++++
 2 files changed, 137 insertions(+)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 55200406..c010124 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -10,4 +10,17 @@ void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
 				 struct kvm_memory_slot *dont);
 int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
 				  unsigned long npages);
+
+void
+kvm_slot_page_track_add_page_nolock(struct kvm *kvm,
+				    struct kvm_memory_slot *slot, gfn_t gfn,
+				    enum kvm_page_track_mode mode);
+void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
+			     enum kvm_page_track_mode mode);
+void kvm_slot_page_track_remove_page_nolock(struct kvm *kvm,
+					    struct kvm_memory_slot *slot,
+					    gfn_t gfn,
+					    enum kvm_page_track_mode mode);
+void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
+				enum kvm_page_track_mode mode);
 #endif
diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
index 8c396d0..e17efe9 100644
--- a/arch/x86/kvm/page_track.c
+++ b/arch/x86/kvm/page_track.c
@@ -50,3 +50,127 @@ track_free:
 	kvm_page_track_free_memslot(slot, NULL);
 	return -ENOMEM;
 }
+
+static bool check_mode(enum kvm_page_track_mode mode)
+{
+	if (mode < 0 || mode >= KVM_PAGE_TRACK_MAX)
+		return false;
+
+	return true;
+}
+
+static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn,
+			     enum kvm_page_track_mode mode, short count)
+{
+	int index;
+	unsigned short val;
+
+	index = gfn_to_index(gfn, slot->base_gfn, PT_PAGE_TABLE_LEVEL);
+
+	val = slot->arch.gfn_track[mode][index];
+
+	/* does tracking count wrap? */
+	WARN_ON((count > 0) && (val + count < val));
+	/* the last tracker has already gone? */
+	WARN_ON((count < 0) && (val < !count));
+
+	slot->arch.gfn_track[mode][index] += count;
+}
+
+void
+kvm_slot_page_track_add_page_nolock(struct kvm *kvm,
+				    struct kvm_memory_slot *slot, gfn_t gfn,
+				    enum kvm_page_track_mode mode)
+{
+
+	WARN_ON(!check_mode(mode));
+
+	update_gfn_track(slot, gfn, mode, 1);
+
+	/*
+	 * new track stops large page mapping for the
+	 * tracked page.
+	 */
+	kvm_mmu_gfn_disallow_lpage(slot, gfn);
+
+	if (mode == KVM_PAGE_TRACK_WRITE)
+		if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn))
+			kvm_flush_remote_tlbs(kvm);
+}
+
+/*
+ * add guest page to the tracking pool so that corresponding access on that
+ * page will be intercepted.
+ *
+ * It should be called under the protection of kvm->srcu or kvm->slots_lock
+ *
+ * @kvm: the guest instance we are interested in.
+ * @gfn: the guest page.
+ * @mode: tracking mode, currently only write track is supported.
+ */
+void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
+			     enum kvm_page_track_mode mode)
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *slot;
+	int i;
+
+	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+		slots = __kvm_memslots(kvm, i);
+
+		slot = __gfn_to_memslot(slots, gfn);
+		if (!slot)
+			continue;
+
+		spin_lock(&kvm->mmu_lock);
+		kvm_slot_page_track_add_page_nolock(kvm, slot, gfn, mode);
+		spin_unlock(&kvm->mmu_lock);
+	}
+}
+
+void kvm_slot_page_track_remove_page_nolock(struct kvm *kvm,
+					    struct kvm_memory_slot *slot,
+					    gfn_t gfn,
+					    enum kvm_page_track_mode mode)
+{
+	WARN_ON(!check_mode(mode));
+
+	update_gfn_track(slot, gfn, mode, -1);
+
+	/*
+	 * allow large page mapping for the tracked page
+	 * after the tracker is gone.
+	 */
+	kvm_mmu_gfn_allow_lpage(slot, gfn);
+}
+
+/*
+ * remove the guest page from the tracking pool which stops the interception
+ * of corresponding access on that page. It is the opposed operation of
+ * kvm_page_track_add_page().
+ *
+ * It should be called under the protection of kvm->srcu or kvm->slots_lock
+ *
+ * @kvm: the guest instance we are interested in.
+ * @gfn: the guest page.
+ * @mode: tracking mode, currently only write track is supported.
+ */
+void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
+				enum kvm_page_track_mode mode)
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *slot;
+	int i;
+
+	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+		slots = __kvm_memslots(kvm, i);
+
+		slot = __gfn_to_memslot(slots, gfn);
+		if (!slot)
+			continue;
+
+		spin_lock(&kvm->mmu_lock);
+		kvm_slot_page_track_remove_page_nolock(kvm, slot, gfn, mode);
+		spin_unlock(&kvm->mmu_lock);
+	}
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 06/11] KVM: MMU: let page fault handler be aware tracked page
  2016-02-14 11:31 [PATCH v3 00/11] KVM: x86: track guest page access Xiao Guangrong
                   ` (4 preceding siblings ...)
  2016-02-14 11:31 ` [PATCH v3 05/11] KVM: page track: introduce kvm_page_track_{add,remove}_page Xiao Guangrong
@ 2016-02-14 11:31 ` Xiao Guangrong
  2016-02-19 11:45   ` Paolo Bonzini
  2016-02-14 11:31 ` [PATCH v3 07/11] KVM: page track: add notifier support Xiao Guangrong
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Xiao Guangrong @ 2016-02-14 11:31 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song, Xiao Guangrong

The page fault caused by write access on the write tracked page can not
be fixed, it always need to be emulated. page_fault_handle_page_track()
is the fast path we introduce here to skip holding mmu-lock and shadow
page table walking

However, if the page table is not present, it is worth making the page
table entry present and readonly to make the read access happy

mmu_need_write_protect() need to be cooked to avoid page becoming writable
when making page table present or sync/prefetch shadow page table entries

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/include/asm/kvm_page_track.h |  2 ++
 arch/x86/kvm/mmu.c                    | 44 +++++++++++++++++++++++++++++------
 arch/x86/kvm/page_track.c             | 14 +++++++++++
 arch/x86/kvm/paging_tmpl.h            |  3 +++
 4 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index c010124..97ac9c3 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -23,4 +23,6 @@ void kvm_slot_page_track_remove_page_nolock(struct kvm *kvm,
 					    enum kvm_page_track_mode mode);
 void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
 				enum kvm_page_track_mode mode);
+bool kvm_page_track_check_mode(struct kvm_vcpu *vcpu, gfn_t gfn,
+			       enum kvm_page_track_mode mode);
 #endif
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index edad3c7..bd9c278 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -41,6 +41,7 @@
 #include <asm/cmpxchg.h>
 #include <asm/io.h>
 #include <asm/vmx.h>
+#include <asm/kvm_page_track.h>
 
 /*
  * When setting this variable to true it enables Two-Dimensional-Paging
@@ -2448,25 +2449,29 @@ static void kvm_unsync_pages(struct kvm_vcpu *vcpu,  gfn_t gfn)
 	}
 }
 
-static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
-				  bool can_unsync)
+static bool mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
+				   bool can_unsync)
 {
 	struct kvm_mmu_page *s;
 	bool need_unsync = false;
 
+	if (kvm_page_track_check_mode(vcpu, gfn, KVM_PAGE_TRACK_WRITE))
+		return true;
+
 	for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
 		if (!can_unsync)
-			return 1;
+			return true;
 
 		if (s->role.level != PT_PAGE_TABLE_LEVEL)
-			return 1;
+			return true;
 
 		if (!s->unsync)
 			need_unsync = true;
 	}
 	if (need_unsync)
 		kvm_unsync_pages(vcpu, gfn);
-	return 0;
+
+	return false;
 }
 
 static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
@@ -3381,10 +3386,30 @@ int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct)
 }
 EXPORT_SYMBOL_GPL(handle_mmio_page_fault);
 
+static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
+					 u32 error_code, gfn_t gfn)
+{
+	if (unlikely(error_code & PFERR_RSVD_MASK))
+		return false;
+
+	if (!(error_code & PFERR_PRESENT_MASK) ||
+	      !(error_code & PFERR_WRITE_MASK))
+		return false;
+
+	/*
+	 * guest is writing the page which is write tracked which can
+	 * not be fixed by page fault handler.
+	 */
+	if (kvm_page_track_check_mode(vcpu, gfn, KVM_PAGE_TRACK_WRITE))
+		return true;
+
+	return false;
+}
+
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
 				u32 error_code, bool prefault)
 {
-	gfn_t gfn;
+	gfn_t gfn = gva >> PAGE_SHIFT;
 	int r;
 
 	pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code);
@@ -3396,13 +3421,15 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
 			return r;
 	}
 
+	if (page_fault_handle_page_track(vcpu, error_code, gfn))
+		return 1;
+
 	r = mmu_topup_memory_caches(vcpu);
 	if (r)
 		return r;
 
 	MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
 
-	gfn = gva >> PAGE_SHIFT;
 
 	return nonpaging_map(vcpu, gva & PAGE_MASK,
 			     error_code, gfn, prefault);
@@ -3486,6 +3513,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
 			return r;
 	}
 
+	if (page_fault_handle_page_track(vcpu, error_code, gfn))
+		return 1;
+
 	r = mmu_topup_memory_caches(vcpu);
 	if (r)
 		return r;
diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
index e17efe9..de9b32f 100644
--- a/arch/x86/kvm/page_track.c
+++ b/arch/x86/kvm/page_track.c
@@ -174,3 +174,17 @@ void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
 		spin_unlock(&kvm->mmu_lock);
 	}
 }
+
+/*
+ * check if the corresponding access on the specified guest page is tracked.
+ */
+bool kvm_page_track_check_mode(struct kvm_vcpu *vcpu, gfn_t gfn,
+			       enum kvm_page_track_mode mode)
+{
+	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
+	int index = gfn_to_index(gfn, slot->base_gfn, PT_PAGE_TABLE_LEVEL);
+
+	WARN_ON(!check_mode(mode));
+
+	return !!ACCESS_ONCE(slot->arch.gfn_track[mode][index]);
+}
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 6c9fed9..c3a30c2 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -735,6 +735,9 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
 		return 0;
 	}
 
+	if (page_fault_handle_page_track(vcpu, error_code, walker.gfn))
+		return 1;
+
 	vcpu->arch.write_fault_to_shadow_pgtable = false;
 
 	is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 07/11] KVM: page track: add notifier support
  2016-02-14 11:31 [PATCH v3 00/11] KVM: x86: track guest page access Xiao Guangrong
                   ` (5 preceding siblings ...)
  2016-02-14 11:31 ` [PATCH v3 06/11] KVM: MMU: let page fault handler be aware tracked page Xiao Guangrong
@ 2016-02-14 11:31 ` Xiao Guangrong
  2016-02-19 11:51   ` Paolo Bonzini
  2016-02-14 11:31 ` [PATCH v3 08/11] KVM: MMU: use page track for non-leaf shadow pages Xiao Guangrong
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Xiao Guangrong @ 2016-02-14 11:31 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song, Xiao Guangrong

Notifier list is introduced so that any node wants to receive the track
event can register to the list

Two APIs are introduced here:
- kvm_page_track_register_notifier(): register the notifier to receive
  track event

- kvm_page_track_unregister_notifier(): stop receiving track event by
  unregister the notifier

The callback, node->track_write() is called when a write access on the
write tracked page happens

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h       |  1 +
 arch/x86/include/asm/kvm_page_track.h | 39 ++++++++++++++++++++
 arch/x86/kvm/page_track.c             | 67 +++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c                    |  4 +++
 4 files changed, 111 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d8931d0..282bc2f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -696,6 +696,7 @@ struct kvm_arch {
 	 */
 	struct list_head active_mmu_pages;
 	struct list_head zapped_obsolete_pages;
+	struct kvm_page_track_notifier_head track_notifier_head;
 
 	struct list_head assigned_dev_head;
 	struct iommu_domain *iommu_domain;
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 97ac9c3..1aae4ef 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -6,6 +6,36 @@ enum kvm_page_track_mode {
 	KVM_PAGE_TRACK_MAX,
 };
 
+/*
+ * The notifier represented by @kvm_page_track_notifier_node is linked into
+ * the head which will be notified when guest is triggering the track event.
+ *
+ * Write access on the head is protected by kvm->mmu_lock, read access
+ * is protected by track_srcu.
+ */
+struct kvm_page_track_notifier_head {
+	struct srcu_struct track_srcu;
+	struct hlist_head track_notifier_list;
+};
+
+struct kvm_page_track_notifier_node {
+	struct hlist_node node;
+
+	/*
+	 * It is called when guest is writing the write-tracked page
+	 * and write emulation is finished at that time.
+	 *
+	 * @vcpu: the vcpu where the write access happened.
+	 * @gpa: the physical address written by guest.
+	 * @new: the data was written to the address.
+	 * @bytes: the written length.
+	 */
+	void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			    int bytes);
+};
+
+void kvm_page_track_init(struct kvm *kvm);
+
 void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
 				 struct kvm_memory_slot *dont);
 int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
@@ -25,4 +55,13 @@ void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
 				enum kvm_page_track_mode mode);
 bool kvm_page_track_check_mode(struct kvm_vcpu *vcpu, gfn_t gfn,
 			       enum kvm_page_track_mode mode);
+
+void
+kvm_page_track_register_notifier(struct kvm *kvm,
+				 struct kvm_page_track_notifier_node *n);
+void
+kvm_page_track_unregister_notifier(struct kvm *kvm,
+				   struct kvm_page_track_notifier_node *n);
+void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			  int bytes);
 #endif
diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
index de9b32f..0692cc6 100644
--- a/arch/x86/kvm/page_track.c
+++ b/arch/x86/kvm/page_track.c
@@ -188,3 +188,70 @@ bool kvm_page_track_check_mode(struct kvm_vcpu *vcpu, gfn_t gfn,
 
 	return !!ACCESS_ONCE(slot->arch.gfn_track[mode][index]);
 }
+
+void kvm_page_track_init(struct kvm *kvm)
+{
+	struct kvm_page_track_notifier_head *head;
+
+	head = &kvm->arch.track_notifier_head;
+	init_srcu_struct(&head->track_srcu);
+	INIT_HLIST_HEAD(&head->track_notifier_list);
+}
+
+/*
+ * register the notifier so that event interception for the tracked guest
+ * pages can be received.
+ */
+void
+kvm_page_track_register_notifier(struct kvm *kvm,
+				 struct kvm_page_track_notifier_node *n)
+{
+	struct kvm_page_track_notifier_head *head;
+
+	head = &kvm->arch.track_notifier_head;
+
+	spin_lock(&kvm->mmu_lock);
+	hlist_add_head_rcu(&n->node, &head->track_notifier_list);
+	spin_unlock(&kvm->mmu_lock);
+}
+
+/*
+ * stop receiving the event interception. It is the opposed operation of
+ * kvm_page_track_register_notifier().
+ */
+void
+kvm_page_track_unregister_notifier(struct kvm *kvm,
+				   struct kvm_page_track_notifier_node *n)
+{
+	struct kvm_page_track_notifier_head *head;
+
+	head = &kvm->arch.track_notifier_head;
+
+	spin_lock(&kvm->mmu_lock);
+	hlist_del_rcu(&n->node);
+	spin_unlock(&kvm->mmu_lock);
+	synchronize_srcu(&head->track_srcu);
+}
+
+/*
+ * Notify the node that write access is intercepted and write emulation is
+ * finished at this time.
+ *
+ * The node should figure out if the written page is the one that node is
+ * interested in by itself.
+ */
+void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
+			  int bytes)
+{
+	struct kvm_page_track_notifier_head *head;
+	struct kvm_page_track_notifier_node *n;
+	int idx;
+
+	head = &vcpu->kvm->arch.track_notifier_head;
+
+	idx = srcu_read_lock(&head->track_srcu);
+	hlist_for_each_entry_rcu(n, &head->track_notifier_list, node)
+		if (n->track_write)
+			n->track_write(vcpu, gpa, new, bytes);
+	srcu_read_unlock(&head->track_srcu, idx);
+}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e25ebb7..98019b6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4370,6 +4370,7 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
 	if (ret < 0)
 		return 0;
 	kvm_mmu_pte_write(vcpu, gpa, val, bytes);
+	kvm_page_track_write(vcpu, gpa, val, bytes);
 	return 1;
 }
 
@@ -4628,6 +4629,7 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
 
 	kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
 	kvm_mmu_pte_write(vcpu, gpa, new, bytes);
+	kvm_page_track_write(vcpu, gpa, new, bytes);
 
 	return X86EMUL_CONTINUE;
 
@@ -7748,6 +7750,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn);
 	INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);
 
+	kvm_page_track_init(kvm);
+
 	return 0;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 08/11] KVM: MMU: use page track for non-leaf shadow pages
  2016-02-14 11:31 [PATCH v3 00/11] KVM: x86: track guest page access Xiao Guangrong
                   ` (6 preceding siblings ...)
  2016-02-14 11:31 ` [PATCH v3 07/11] KVM: page track: add notifier support Xiao Guangrong
@ 2016-02-14 11:31 ` Xiao Guangrong
  2016-02-14 11:31 ` [PATCH v3 09/11] KVM: MMU: simplify mmu_need_write_protect Xiao Guangrong
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 39+ messages in thread
From: Xiao Guangrong @ 2016-02-14 11:31 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song, Xiao Guangrong

non-leaf shadow pages are always write protected, it can be the user
of page track

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index bd9c278..e9dbd85 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -806,11 +806,17 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	struct kvm_memory_slot *slot;
 	gfn_t gfn;
 
+	kvm->arch.indirect_shadow_pages++;
 	gfn = sp->gfn;
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
+
+	/* the non-leaf shadow pages are keeping readonly. */
+	if (sp->role.level > PT_PAGE_TABLE_LEVEL)
+		return kvm_slot_page_track_add_page_nolock(kvm, slot, gfn,
+							KVM_PAGE_TRACK_WRITE);
+
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
-	kvm->arch.indirect_shadow_pages++;
 }
 
 static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
@@ -819,11 +825,15 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	struct kvm_memory_slot *slot;
 	gfn_t gfn;
 
+	kvm->arch.indirect_shadow_pages--;
 	gfn = sp->gfn;
 	slots = kvm_memslots_for_spte_role(kvm, sp->role);
 	slot = __gfn_to_memslot(slots, gfn);
+	if (sp->role.level > PT_PAGE_TABLE_LEVEL)
+		return kvm_slot_page_track_remove_page_nolock(kvm, slot, gfn,
+							KVM_PAGE_TRACK_WRITE);
+
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
-	kvm->arch.indirect_shadow_pages--;
 }
 
 static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
@@ -2132,12 +2142,18 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 	hlist_add_head(&sp->hash_link,
 		&vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]);
 	if (!direct) {
-		if (rmap_write_protect(vcpu, gfn))
+		/*
+		 * we should do write protection before syncing pages
+		 * otherwise the content of the synced shadow page may
+		 * be inconsistent with guest page table.
+		 */
+		account_shadowed(vcpu->kvm, sp);
+
+		if (level == PT_PAGE_TABLE_LEVEL &&
+		      rmap_write_protect(vcpu, gfn))
 			kvm_flush_remote_tlbs(vcpu->kvm);
 		if (level > PT_PAGE_TABLE_LEVEL && need_sync)
 			kvm_sync_pages(vcpu, gfn);
-
-		account_shadowed(vcpu->kvm, sp);
 	}
 	sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
 	clear_page(sp->spt);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 09/11] KVM: MMU: simplify mmu_need_write_protect
  2016-02-14 11:31 [PATCH v3 00/11] KVM: x86: track guest page access Xiao Guangrong
                   ` (7 preceding siblings ...)
  2016-02-14 11:31 ` [PATCH v3 08/11] KVM: MMU: use page track for non-leaf shadow pages Xiao Guangrong
@ 2016-02-14 11:31 ` Xiao Guangrong
  2016-02-14 11:31 ` [PATCH v3 10/11] KVM: MMU: clear write-flooding on the fast path of tracked page Xiao Guangrong
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 39+ messages in thread
From: Xiao Guangrong @ 2016-02-14 11:31 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song, Xiao Guangrong

Now, all non-leaf shadow page are page tracked, if gfn is not tracked
there is no non-leaf shadow page of gfn is existed, we can directly
make the shadow page of gfn to unsync

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 29 +++++++----------------------
 1 file changed, 7 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e9dbd85..4986615 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2444,7 +2444,7 @@ int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
 
-static void __kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+static void kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
 	trace_kvm_mmu_unsync_page(sp);
 	++vcpu->kvm->stat.mmu_unsync;
@@ -2453,39 +2453,24 @@ static void __kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 	kvm_mmu_mark_parents_unsync(sp);
 }
 
-static void kvm_unsync_pages(struct kvm_vcpu *vcpu,  gfn_t gfn)
-{
-	struct kvm_mmu_page *s;
-
-	for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
-		if (s->unsync)
-			continue;
-		WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
-		__kvm_unsync_page(vcpu, s);
-	}
-}
-
 static bool mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
 				   bool can_unsync)
 {
-	struct kvm_mmu_page *s;
-	bool need_unsync = false;
+	struct kvm_mmu_page *sp;
 
 	if (kvm_page_track_check_mode(vcpu, gfn, KVM_PAGE_TRACK_WRITE))
 		return true;
 
-	for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
+	for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn) {
 		if (!can_unsync)
 			return true;
 
-		if (s->role.level != PT_PAGE_TABLE_LEVEL)
-			return true;
+		if (sp->unsync)
+			continue;
 
-		if (!s->unsync)
-			need_unsync = true;
+		WARN_ON(sp->role.level != PT_PAGE_TABLE_LEVEL);
+		kvm_unsync_page(vcpu, sp);
 	}
-	if (need_unsync)
-		kvm_unsync_pages(vcpu, gfn);
 
 	return false;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 10/11] KVM: MMU: clear write-flooding on the fast path of tracked page
  2016-02-14 11:31 [PATCH v3 00/11] KVM: x86: track guest page access Xiao Guangrong
                   ` (8 preceding siblings ...)
  2016-02-14 11:31 ` [PATCH v3 09/11] KVM: MMU: simplify mmu_need_write_protect Xiao Guangrong
@ 2016-02-14 11:31 ` Xiao Guangrong
  2016-02-19 11:55   ` Paolo Bonzini
  2016-02-14 11:31 ` [PATCH v3 11/11] KVM: MMU: apply page track notifier Xiao Guangrong
  2016-02-19 12:00 ` [PATCH v3 00/11] KVM: x86: track guest page access Paolo Bonzini
  11 siblings, 1 reply; 39+ messages in thread
From: Xiao Guangrong @ 2016-02-14 11:31 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song, Xiao Guangrong

If the page fault is caused by write access on write tracked page, the
real shadow page walking is skipped, we lost the chance to clear write
flooding for the page structure current vcpu is using

Fix it by locklessly waking shadow page table to clear write flooding
on the shadow page structure out of mmu-lock. So that we change the
count to atomic_t

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/mmu.c              | 22 ++++++++++++++++++++--
 arch/x86/kvm/paging_tmpl.h      |  4 +++-
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 282bc2f..254d103 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -277,7 +277,7 @@ struct kvm_mmu_page {
 #endif
 
 	/* Number of writes since the last time traversal visited this page.  */
-	int write_flooding_count;
+	atomic_t write_flooding_count;
 };
 
 struct kvm_pio_request {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4986615..f924e6c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2073,7 +2073,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
 
 static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
 {
-	sp->write_flooding_count = 0;
+	atomic_set(&sp->write_flooding_count,  0);
 }
 
 static void clear_sp_write_flooding_count(u64 *spte)
@@ -3407,6 +3407,23 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
 	return false;
 }
 
+static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
+{
+	struct kvm_shadow_walk_iterator iterator;
+	u64 spte;
+
+	if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
+		return;
+
+	walk_shadow_page_lockless_begin(vcpu);
+	for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
+		clear_sp_write_flooding_count(iterator.sptep);
+		if (!is_shadow_present_pte(spte))
+			break;
+	}
+	walk_shadow_page_lockless_end(vcpu);
+}
+
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
 				u32 error_code, bool prefault)
 {
@@ -4236,7 +4253,8 @@ static bool detect_write_flooding(struct kvm_mmu_page *sp)
 	if (sp->role.level == PT_PAGE_TABLE_LEVEL)
 		return false;
 
-	return ++sp->write_flooding_count >= 3;
+	atomic_inc(&sp->write_flooding_count);
+	return atomic_read(&sp->write_flooding_count) >= 3;
 }
 
 /*
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index c3a30c2..5985156 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -735,8 +735,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
 		return 0;
 	}
 
-	if (page_fault_handle_page_track(vcpu, error_code, walker.gfn))
+	if (page_fault_handle_page_track(vcpu, error_code, walker.gfn)) {
+		shadow_page_table_clear_flood(vcpu, addr);
 		return 1;
+	}
 
 	vcpu->arch.write_fault_to_shadow_pgtable = false;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 11/11] KVM: MMU: apply page track notifier
  2016-02-14 11:31 [PATCH v3 00/11] KVM: x86: track guest page access Xiao Guangrong
                   ` (9 preceding siblings ...)
  2016-02-14 11:31 ` [PATCH v3 10/11] KVM: MMU: clear write-flooding on the fast path of tracked page Xiao Guangrong
@ 2016-02-14 11:31 ` Xiao Guangrong
  2016-02-19 11:56   ` Paolo Bonzini
  2016-02-19 12:00 ` [PATCH v3 00/11] KVM: x86: track guest page access Paolo Bonzini
  11 siblings, 1 reply; 39+ messages in thread
From: Xiao Guangrong @ 2016-02-14 11:31 UTC (permalink / raw)
  To: pbonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song, Xiao Guangrong

Register the notifier to receive write track event so that we can update
our shadow page table

It makes kvm_mmu_pte_write() be the callback of the notifier, no function
is changed

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |  5 +++--
 arch/x86/kvm/mmu.c              | 19 +++++++++++++++++--
 arch/x86/kvm/x86.c              |  4 ++--
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 254d103..5246f07 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -696,6 +696,7 @@ struct kvm_arch {
 	 */
 	struct list_head active_mmu_pages;
 	struct list_head zapped_obsolete_pages;
+	struct kvm_page_track_notifier_node mmu_sp_tracker;
 	struct kvm_page_track_notifier_head track_notifier_head;
 
 	struct list_head assigned_dev_head;
@@ -994,6 +995,8 @@ void kvm_mmu_module_exit(void);
 void kvm_mmu_destroy(struct kvm_vcpu *vcpu);
 int kvm_mmu_create(struct kvm_vcpu *vcpu);
 void kvm_mmu_setup(struct kvm_vcpu *vcpu);
+void kvm_mmu_init_vm(struct kvm *kvm);
+void kvm_mmu_uninit_vm(struct kvm *kvm);
 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
 		u64 dirty_mask, u64 nx_mask, u64 x_mask);
 
@@ -1133,8 +1136,6 @@ void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);
 
 void kvm_inject_nmi(struct kvm_vcpu *vcpu);
 
-void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-		       const u8 *new, int bytes);
 int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
 int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva);
 void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f924e6c..57cf30b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4316,8 +4316,8 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
 	return spte;
 }
 
-void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-		       const u8 *new, int bytes)
+static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
+			      const u8 *new, int bytes)
 {
 	gfn_t gfn = gpa >> PAGE_SHIFT;
 	struct kvm_mmu_page *sp;
@@ -4531,6 +4531,21 @@ void kvm_mmu_setup(struct kvm_vcpu *vcpu)
 	init_kvm_mmu(vcpu);
 }
 
+void kvm_mmu_init_vm(struct kvm *kvm)
+{
+	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
+
+	node->track_write = kvm_mmu_pte_write;
+	kvm_page_track_register_notifier(kvm, node);
+}
+
+void kvm_mmu_uninit_vm(struct kvm *kvm)
+{
+	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
+
+	kvm_page_track_unregister_notifier(kvm, node);
+}
+
 /* The return value indicates if tlb flush on all vcpus is needed. */
 typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 98019b6..319d572 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4369,7 +4369,6 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
 	ret = kvm_vcpu_write_guest(vcpu, gpa, val, bytes);
 	if (ret < 0)
 		return 0;
-	kvm_mmu_pte_write(vcpu, gpa, val, bytes);
 	kvm_page_track_write(vcpu, gpa, val, bytes);
 	return 1;
 }
@@ -4628,7 +4627,6 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
 		return X86EMUL_CMPXCHG_FAILED;
 
 	kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
-	kvm_mmu_pte_write(vcpu, gpa, new, bytes);
 	kvm_page_track_write(vcpu, gpa, new, bytes);
 
 	return X86EMUL_CONTINUE;
@@ -7751,6 +7749,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);
 
 	kvm_page_track_init(kvm);
+	kvm_mmu_init_vm(kvm);
 
 	return 0;
 }
@@ -7878,6 +7877,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kfree(kvm->arch.vioapic);
 	kvm_free_vcpus(kvm);
 	kfree(rcu_dereference_check(kvm->arch.apic_map, 1));
+	kvm_mmu_uninit_vm(kvm);
 }
 
 void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 01/11] KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed
  2016-02-14 11:31 ` [PATCH v3 01/11] KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed Xiao Guangrong
@ 2016-02-19 11:08   ` Paolo Bonzini
  0 siblings, 0 replies; 39+ messages in thread
From: Paolo Bonzini @ 2016-02-19 11:08 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 14/02/2016 12:31, Xiao Guangrong wrote:
> kvm_lpage_info->write_count is used to detect if the large page mapping
> for the gfn on the specified level is allowed, rename it to disallow_lpage
> to reflect its purpose, also we rename has_wrprotected_page() to
> mmu_gfn_lpage_is_disallowed() to make the code more clearer
> 
> Later we will extend this mechanism for page tracking: if the gfn is
> tracked then large mapping for that gfn on any level is not allowed.
> The new name is more straightforward
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  Documentation/virtual/kvm/mmu.txt |  6 +++---
>  arch/x86/include/asm/kvm_host.h   |  2 +-
>  arch/x86/kvm/mmu.c                | 25 +++++++++++++------------
>  arch/x86/kvm/x86.c                | 14 ++++++++------
>  4 files changed, 25 insertions(+), 22 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
> index daf9c0f..dda2e93 100644
> --- a/Documentation/virtual/kvm/mmu.txt
> +++ b/Documentation/virtual/kvm/mmu.txt
> @@ -391,11 +391,11 @@ To instantiate a large spte, four constraints must be satisfied:
>    write-protected pages
>  - the guest page must be wholly contained by a single memory slot
>  
> -To check the last two conditions, the mmu maintains a ->write_count set of
> +To check the last two conditions, the mmu maintains a ->disallow_lpage set of
>  arrays for each memory slot and large page size.  Every write protected page
> -causes its write_count to be incremented, thus preventing instantiation of
> +causes its disallow_lpage to be incremented, thus preventing instantiation of
>  a large spte.  The frames at the end of an unaligned memory slot have
> -artificially inflated ->write_counts so they can never be instantiated.
> +artificially inflated ->disallow_lpages so they can never be instantiated.
>  
>  Zapping all pages (page generation count)
>  =========================================
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 7dd6d55..e1c1f57 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -644,7 +644,7 @@ struct kvm_vcpu_arch {
>  };
>  
>  struct kvm_lpage_info {
> -	int write_count;
> +	int disallow_lpage;
>  };
>  
>  struct kvm_arch_memory_slot {
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 95a955d..de9e992 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -789,7 +789,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
>  	slot = __gfn_to_memslot(slots, gfn);
>  	for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
>  		linfo = lpage_info_slot(gfn, slot, i);
> -		linfo->write_count += 1;
> +		linfo->disallow_lpage += 1;
>  	}
>  	kvm->arch.indirect_shadow_pages++;
>  }
> @@ -807,31 +807,32 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
>  	slot = __gfn_to_memslot(slots, gfn);
>  	for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
>  		linfo = lpage_info_slot(gfn, slot, i);
> -		linfo->write_count -= 1;
> -		WARN_ON(linfo->write_count < 0);
> +		linfo->disallow_lpage -= 1;
> +		WARN_ON(linfo->disallow_lpage < 0);
>  	}
>  	kvm->arch.indirect_shadow_pages--;
>  }
>  
> -static int __has_wrprotected_page(gfn_t gfn, int level,
> -				  struct kvm_memory_slot *slot)
> +static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
> +					  struct kvm_memory_slot *slot)
>  {
>  	struct kvm_lpage_info *linfo;
>  
>  	if (slot) {
>  		linfo = lpage_info_slot(gfn, slot, level);
> -		return linfo->write_count;
> +		return !!linfo->disallow_lpage;
>  	}
>  
> -	return 1;
> +	return true;
>  }
>  
> -static int has_wrprotected_page(struct kvm_vcpu *vcpu, gfn_t gfn, int level)
> +static bool mmu_gfn_lpage_is_disallowed(struct kvm_vcpu *vcpu, gfn_t gfn,
> +					int level)
>  {
>  	struct kvm_memory_slot *slot;
>  
>  	slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
> -	return __has_wrprotected_page(gfn, level, slot);
> +	return __mmu_gfn_lpage_is_disallowed(gfn, level, slot);
>  }
>  
>  static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
> @@ -897,7 +898,7 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn,
>  	max_level = min(kvm_x86_ops->get_lpage_level(), host_level);
>  
>  	for (level = PT_DIRECTORY_LEVEL; level <= max_level; ++level)
> -		if (__has_wrprotected_page(large_gfn, level, slot))
> +		if (__mmu_gfn_lpage_is_disallowed(large_gfn, level, slot))
>  			break;
>  
>  	return level - 1;
> @@ -2503,7 +2504,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  		 * be fixed if guest refault.
>  		 */
>  		if (level > PT_PAGE_TABLE_LEVEL &&
> -		    has_wrprotected_page(vcpu, gfn, level))
> +		    mmu_gfn_lpage_is_disallowed(vcpu, gfn, level))
>  			goto done;
>  
>  		spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE;
> @@ -2768,7 +2769,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
>  	if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) &&
>  	    level == PT_PAGE_TABLE_LEVEL &&
>  	    PageTransCompound(pfn_to_page(pfn)) &&
> -	    !has_wrprotected_page(vcpu, gfn, PT_DIRECTORY_LEVEL)) {
> +	    !mmu_gfn_lpage_is_disallowed(vcpu, gfn, PT_DIRECTORY_LEVEL)) {
>  		unsigned long mask;
>  		/*
>  		 * mmu_notifier_retry was successful and we hold the
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cf15bc5..f448e64 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7903,6 +7903,7 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
>  	int i;
>  
>  	for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> +		struct kvm_lpage_info *linfo;
>  		unsigned long ugfn;
>  		int lpages;
>  		int level = i + 1;
> @@ -7917,15 +7918,16 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
>  		if (i == 0)
>  			continue;
>  
> -		slot->arch.lpage_info[i - 1] = kvm_kvzalloc(lpages *
> -					sizeof(*slot->arch.lpage_info[i - 1]));
> -		if (!slot->arch.lpage_info[i - 1])
> +		linfo = kvm_kvzalloc(lpages * sizeof(*linfo));
> +		if (!linfo)
>  			goto out_free;
>  
> +		slot->arch.lpage_info[i - 1] = linfo;
> +
>  		if (slot->base_gfn & (KVM_PAGES_PER_HPAGE(level) - 1))
> -			slot->arch.lpage_info[i - 1][0].write_count = 1;
> +			linfo[0].disallow_lpage = 1;
>  		if ((slot->base_gfn + npages) & (KVM_PAGES_PER_HPAGE(level) - 1))
> -			slot->arch.lpage_info[i - 1][lpages - 1].write_count = 1;
> +			linfo[lpages - 1].disallow_lpage = 1;
>  		ugfn = slot->userspace_addr >> PAGE_SHIFT;
>  		/*
>  		 * If the gfn and userspace address are not aligned wrt each
> @@ -7937,7 +7939,7 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
>  			unsigned long j;
>  
>  			for (j = 0; j < lpages; ++j)
> -				slot->arch.lpage_info[i - 1][j].write_count = 1;
> +				linfo[j].disallow_lpage = 1;
>  		}
>  	}
>  
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 02/11] KVM: MMU: introduce kvm_mmu_gfn_{allow,disallow}_lpage
  2016-02-14 11:31 ` [PATCH v3 02/11] KVM: MMU: introduce kvm_mmu_gfn_{allow,disallow}_lpage Xiao Guangrong
@ 2016-02-19 11:09   ` Paolo Bonzini
  0 siblings, 0 replies; 39+ messages in thread
From: Paolo Bonzini @ 2016-02-19 11:09 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 14/02/2016 12:31, Xiao Guangrong wrote:
> Abstract the common operations from account_shadowed() and
> unaccount_shadowed(), then introduce kvm_mmu_gfn_disallow_lpage()
> and kvm_mmu_gfn_allow_lpage()
> 
> These two functions will be used by page tracking in the later patch
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  arch/x86/kvm/mmu.c | 38 +++++++++++++++++++++++++-------------
>  arch/x86/kvm/mmu.h |  3 +++
>  2 files changed, 28 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index de9e992..e1bb66c 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -776,21 +776,39 @@ static struct kvm_lpage_info *lpage_info_slot(gfn_t gfn,
>  	return &slot->arch.lpage_info[level - 2][idx];
>  }
>  
> +static void update_gfn_disallow_lpage_count(struct kvm_memory_slot *slot,
> +					    gfn_t gfn, int count)
> +{
> +	struct kvm_lpage_info *linfo;
> +	int i;
> +
> +	for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
> +		linfo = lpage_info_slot(gfn, slot, i);
> +		linfo->disallow_lpage += count;
> +		WARN_ON(linfo->disallow_lpage < 0);
> +	}
> +}
> +
> +void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn)
> +{
> +	update_gfn_disallow_lpage_count(slot, gfn, 1);
> +}
> +
> +void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn)
> +{
> +	update_gfn_disallow_lpage_count(slot, gfn, -1);
> +}
> +
>  static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
>  	struct kvm_memslots *slots;
>  	struct kvm_memory_slot *slot;
> -	struct kvm_lpage_info *linfo;
>  	gfn_t gfn;
> -	int i;
>  
>  	gfn = sp->gfn;
>  	slots = kvm_memslots_for_spte_role(kvm, sp->role);
>  	slot = __gfn_to_memslot(slots, gfn);
> -	for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
> -		linfo = lpage_info_slot(gfn, slot, i);
> -		linfo->disallow_lpage += 1;
> -	}
> +	kvm_mmu_gfn_disallow_lpage(slot, gfn);
>  	kvm->arch.indirect_shadow_pages++;
>  }
>  
> @@ -798,18 +816,12 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
>  	struct kvm_memslots *slots;
>  	struct kvm_memory_slot *slot;
> -	struct kvm_lpage_info *linfo;
>  	gfn_t gfn;
> -	int i;
>  
>  	gfn = sp->gfn;
>  	slots = kvm_memslots_for_spte_role(kvm, sp->role);
>  	slot = __gfn_to_memslot(slots, gfn);
> -	for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
> -		linfo = lpage_info_slot(gfn, slot, i);
> -		linfo->disallow_lpage -= 1;
> -		WARN_ON(linfo->disallow_lpage < 0);
> -	}
> +	kvm_mmu_gfn_allow_lpage(slot, gfn);
>  	kvm->arch.indirect_shadow_pages--;
>  }
>  
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 55ffb7b..de92bed 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -174,4 +174,7 @@ static inline bool permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  
>  void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm);
>  void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
> +
> +void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
> +void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
>  #endif
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 03/11] KVM: MMU: introduce kvm_mmu_slot_gfn_write_protect
  2016-02-14 11:31 ` [PATCH v3 03/11] KVM: MMU: introduce kvm_mmu_slot_gfn_write_protect Xiao Guangrong
@ 2016-02-19 11:18   ` Paolo Bonzini
  0 siblings, 0 replies; 39+ messages in thread
From: Paolo Bonzini @ 2016-02-19 11:18 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 14/02/2016 12:31, Xiao Guangrong wrote:
> Split rmap_write_protect() and introduce the function to abstract the write
> protection based on the slot
> 
> This function will be used in the later patch
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  arch/x86/kvm/mmu.c | 16 +++++++++++-----
>  arch/x86/kvm/mmu.h |  2 ++
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index e1bb66c..edad3c7 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1336,23 +1336,29 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  		kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>  }
>  
> -static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn)
> +bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
> +				    struct kvm_memory_slot *slot, u64 gfn)
>  {
> -	struct kvm_memory_slot *slot;
>  	struct kvm_rmap_head *rmap_head;
>  	int i;
>  	bool write_protected = false;
>  
> -	slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
> -
>  	for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
>  		rmap_head = __gfn_to_rmap(gfn, i, slot);
> -		write_protected |= __rmap_write_protect(vcpu->kvm, rmap_head, true);
> +		write_protected |= __rmap_write_protect(kvm, rmap_head, true);
>  	}
>  
>  	return write_protected;
>  }
>  
> +static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn)
> +{
> +	struct kvm_memory_slot *slot;
> +
> +	slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
> +	return kvm_mmu_slot_gfn_write_protect(vcpu->kvm, slot, gfn);
> +}
> +
>  static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
>  {
>  	u64 *sptep;
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index de92bed..58fe98a 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -177,4 +177,6 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
>  
>  void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
>  void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
> +bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
> +				    struct kvm_memory_slot *slot, u64 gfn);
>  #endif
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 04/11] KVM: page track: add the framework of guest page tracking
  2016-02-14 11:31 ` [PATCH v3 04/11] KVM: page track: add the framework of guest page tracking Xiao Guangrong
@ 2016-02-19 11:24   ` Paolo Bonzini
  2016-02-23  3:57     ` Xiao Guangrong
  0 siblings, 1 reply; 39+ messages in thread
From: Paolo Bonzini @ 2016-02-19 11:24 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 14/02/2016 12:31, Xiao Guangrong wrote:
> The array, gfn_track[mode][gfn], is introduced in memory slot for every
> guest page, this is the tracking count for the gust page on different
> modes. If the page is tracked then the count is increased, the page is
> not tracked after the count reaches zero
> 
> We use 'unsigned short' as the tracking count which should be enough as
> shadow page table only can use 2^14 (2^3 for level, 2^1 for cr4_pae, 2^2
> for quadrant, 2^3 for access, 2^1 for nxe, 2^1 for cr0_wp, 2^1 for
> smep_andnot_wp, 2^1 for smap_andnot_wp, and 2^1 for smm) at most, there
> is enough room for other trackers
> 
> Two callbacks, kvm_page_track_create_memslot() and
> kvm_page_track_free_memslot() are implemented in this patch, they are
> internally used to initialize and reclaim the memory of the array
> 
> Currently, only write track mode is supported
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h       |  2 ++
>  arch/x86/include/asm/kvm_page_track.h | 13 +++++++++
>  arch/x86/kvm/Makefile                 |  3 +-
>  arch/x86/kvm/page_track.c             | 52 +++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.c                    |  5 ++++
>  5 files changed, 74 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/include/asm/kvm_page_track.h
>  create mode 100644 arch/x86/kvm/page_track.c
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index e1c1f57..d8931d0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -32,6 +32,7 @@
>  #include <asm/mtrr.h>
>  #include <asm/msr-index.h>
>  #include <asm/asm.h>
> +#include <asm/kvm_page_track.h>
>  
>  #define KVM_MAX_VCPUS 255
>  #define KVM_SOFT_MAX_VCPUS 160
> @@ -650,6 +651,7 @@ struct kvm_lpage_info {
>  struct kvm_arch_memory_slot {
>  	struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES];
>  	struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
> +	unsigned short *gfn_track[KVM_PAGE_TRACK_MAX];

Please add a comment at struct kvm_mmu_page_role mentioning that the
number of role bits for shadow pages (i.e. not counting direct and
invalid) must not exceed 15 (16 thoretically risks overflow already!),
and counting the 14 bits that are in use.

Paolo

>  };
>  
>  /*
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> new file mode 100644
> index 0000000..55200406
> --- /dev/null
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -0,0 +1,13 @@
> +#ifndef _ASM_X86_KVM_PAGE_TRACK_H
> +#define _ASM_X86_KVM_PAGE_TRACK_H
> +
> +enum kvm_page_track_mode {
> +	KVM_PAGE_TRACK_WRITE,
> +	KVM_PAGE_TRACK_MAX,
> +};
> +
> +void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
> +				 struct kvm_memory_slot *dont);
> +int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
> +				  unsigned long npages);
> +#endif
> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> index a1ff508..464fa47 100644
> --- a/arch/x86/kvm/Makefile
> +++ b/arch/x86/kvm/Makefile
> @@ -13,9 +13,10 @@ kvm-$(CONFIG_KVM_ASYNC_PF)	+= $(KVM)/async_pf.o
>  
>  kvm-y			+= x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
>  			   i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
> -			   hyperv.o
> +			   hyperv.o page_track.o
>  
>  kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)	+= assigned-dev.o iommu.o
> +
>  kvm-intel-y		+= vmx.o pmu_intel.o
>  kvm-amd-y		+= svm.o pmu_amd.o
>  
> diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
> new file mode 100644
> index 0000000..8c396d0
> --- /dev/null
> +++ b/arch/x86/kvm/page_track.c
> @@ -0,0 +1,52 @@
> +/*
> + * Support KVM gust page tracking
> + *
> + * This feature allows us to track page access in guest. Currently, only
> + * write access is tracked.
> + *
> + * Copyright(C) 2015 Intel Corporation.
> + *
> + * Author:
> + *   Xiao Guangrong <guangrong.xiao@linux.intel.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + */
> +
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_host.h>
> +#include <asm/kvm_page_track.h>
> +
> +#include "mmu.h"
> +
> +void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
> +				 struct kvm_memory_slot *dont)
> +{
> +	int i;
> +
> +	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++)
> +		if (!dont || free->arch.gfn_track[i] !=
> +		      dont->arch.gfn_track[i]) {
> +			kvfree(free->arch.gfn_track[i]);
> +			free->arch.gfn_track[i] = NULL;
> +		}
> +}
> +
> +int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
> +				  unsigned long npages)
> +{
> +	int  i;
> +
> +	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
> +		slot->arch.gfn_track[i] = kvm_kvzalloc(npages *
> +					    sizeof(*slot->arch.gfn_track[i]));
> +		if (!slot->arch.gfn_track[i])
> +			goto track_free;
> +	}
> +
> +	return 0;
> +
> +track_free:
> +	kvm_page_track_free_memslot(slot, NULL);
> +	return -ENOMEM;
> +}
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f448e64..e25ebb7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7895,6 +7895,8 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
>  			free->arch.lpage_info[i - 1] = NULL;
>  		}
>  	}
> +
> +	kvm_page_track_free_memslot(free, dont);
>  }
>  
>  int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
> @@ -7943,6 +7945,9 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
>  		}
>  	}
>  
> +	if (kvm_page_track_create_memslot(slot, npages))
> +		goto out_free;
> +
>  	return 0;
>  
>  out_free:
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 05/11] KVM: page track: introduce kvm_page_track_{add,remove}_page
  2016-02-14 11:31 ` [PATCH v3 05/11] KVM: page track: introduce kvm_page_track_{add,remove}_page Xiao Guangrong
@ 2016-02-19 11:37   ` Paolo Bonzini
  2016-02-23  4:18     ` Xiao Guangrong
  2016-02-19 11:37   ` Paolo Bonzini
  1 sibling, 1 reply; 39+ messages in thread
From: Paolo Bonzini @ 2016-02-19 11:37 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 14/02/2016 12:31, Xiao Guangrong wrote:
> +	/* does tracking count wrap? */
> +	WARN_ON((count > 0) && (val + count < val));

This doesn't work, because "val + count" is an int.

> +	/* the last tracker has already gone? */
> +	WARN_ON((count < 0) && (val < !count));

Also, here any underflow should warn.

You can actually use the fact that val + count is an int like this:

    WARN_ON(val + count < 0 || val + count > USHRT_MAX)

and also please return if the warning fires.

> +void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
> +			     enum kvm_page_track_mode mode)
> +{
> +	struct kvm_memslots *slots;
> +	struct kvm_memory_slot *slot;
> +	int i;
> +
> +	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> +		slots = __kvm_memslots(kvm, i);
> +
> +		slot = __gfn_to_memslot(slots, gfn);
> +		if (!slot)
> +			continue;
> +
> +		spin_lock(&kvm->mmu_lock);
> +		kvm_slot_page_track_add_page_nolock(kvm, slot, gfn, mode);
> +		spin_unlock(&kvm->mmu_lock);
> +	}
> +}

I don't think it is right to walk all address spaces.  The good news is
that you're not using kvm_page_track_{add,remove}_page at all as far as
I can see, so you can just remove them.

Also, when you will need it, I think it's better to move the
spin_lock/spin_unlock pair outside the for loop.  With this change,
perhaps it's better to leave it to the caller completely---but I cannot
say until I see the caller.

In the meanwhile, please leave out _nolock from the other functions' name.

Paolo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 05/11] KVM: page track: introduce kvm_page_track_{add,remove}_page
  2016-02-14 11:31 ` [PATCH v3 05/11] KVM: page track: introduce kvm_page_track_{add,remove}_page Xiao Guangrong
  2016-02-19 11:37   ` Paolo Bonzini
@ 2016-02-19 11:37   ` Paolo Bonzini
  2016-02-23  4:18     ` Xiao Guangrong
  1 sibling, 1 reply; 39+ messages in thread
From: Paolo Bonzini @ 2016-02-19 11:37 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 14/02/2016 12:31, Xiao Guangrong wrote:
> +static bool check_mode(enum kvm_page_track_mode mode)
> +{
> +	if (mode < 0 || mode >= KVM_PAGE_TRACK_MAX)
> +		return false;
> +
> +	return true;
> +}

Oops, forgot about this; please rename to page_track_mode_is_valid and
make it "static inline".

Paolo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 06/11] KVM: MMU: let page fault handler be aware tracked page
  2016-02-14 11:31 ` [PATCH v3 06/11] KVM: MMU: let page fault handler be aware tracked page Xiao Guangrong
@ 2016-02-19 11:45   ` Paolo Bonzini
  2016-02-23  4:19     ` Xiao Guangrong
  0 siblings, 1 reply; 39+ messages in thread
From: Paolo Bonzini @ 2016-02-19 11:45 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 14/02/2016 12:31, Xiao Guangrong wrote:
> +/*
> + * check if the corresponding access on the specified guest page is tracked.
> + */
> +bool kvm_page_track_check_mode(struct kvm_vcpu *vcpu, gfn_t gfn,
> +			       enum kvm_page_track_mode mode)

Please rename to kvm_page_track_is_active.

Paolo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 07/11] KVM: page track: add notifier support
  2016-02-14 11:31 ` [PATCH v3 07/11] KVM: page track: add notifier support Xiao Guangrong
@ 2016-02-19 11:51   ` Paolo Bonzini
  2016-02-23  4:34     ` Xiao Guangrong
  0 siblings, 1 reply; 39+ messages in thread
From: Paolo Bonzini @ 2016-02-19 11:51 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 14/02/2016 12:31, Xiao Guangrong wrote:
> Notifier list is introduced so that any node wants to receive the track
> event can register to the list
> 
> Two APIs are introduced here:
> - kvm_page_track_register_notifier(): register the notifier to receive
>   track event
> 
> - kvm_page_track_unregister_notifier(): stop receiving track event by
>   unregister the notifier
> 
> The callback, node->track_write() is called when a write access on the
> write tracked page happens
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h       |  1 +
>  arch/x86/include/asm/kvm_page_track.h | 39 ++++++++++++++++++++
>  arch/x86/kvm/page_track.c             | 67 +++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.c                    |  4 +++
>  4 files changed, 111 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index d8931d0..282bc2f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -696,6 +696,7 @@ struct kvm_arch {
>  	 */
>  	struct list_head active_mmu_pages;
>  	struct list_head zapped_obsolete_pages;
> +	struct kvm_page_track_notifier_head track_notifier_head;
>  
>  	struct list_head assigned_dev_head;
>  	struct iommu_domain *iommu_domain;
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index 97ac9c3..1aae4ef 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -6,6 +6,36 @@ enum kvm_page_track_mode {
>  	KVM_PAGE_TRACK_MAX,
>  };
>  
> +/*
> + * The notifier represented by @kvm_page_track_notifier_node is linked into
> + * the head which will be notified when guest is triggering the track event.
> + *
> + * Write access on the head is protected by kvm->mmu_lock, read access
> + * is protected by track_srcu.
> + */
> +struct kvm_page_track_notifier_head {
> +	struct srcu_struct track_srcu;
> +	struct hlist_head track_notifier_list;
> +};
> +
> +struct kvm_page_track_notifier_node {
> +	struct hlist_node node;
> +
> +	/*
> +	 * It is called when guest is writing the write-tracked page
> +	 * and write emulation is finished at that time.
> +	 *
> +	 * @vcpu: the vcpu where the write access happened.
> +	 * @gpa: the physical address written by guest.
> +	 * @new: the data was written to the address.
> +	 * @bytes: the written length.
> +	 */
> +	void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> +			    int bytes);
> +};
> +
> +void kvm_page_track_init(struct kvm *kvm);
> +
>  void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
>  				 struct kvm_memory_slot *dont);
>  int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
> @@ -25,4 +55,13 @@ void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
>  				enum kvm_page_track_mode mode);
>  bool kvm_page_track_check_mode(struct kvm_vcpu *vcpu, gfn_t gfn,
>  			       enum kvm_page_track_mode mode);
> +
> +void
> +kvm_page_track_register_notifier(struct kvm *kvm,
> +				 struct kvm_page_track_notifier_node *n);
> +void
> +kvm_page_track_unregister_notifier(struct kvm *kvm,
> +				   struct kvm_page_track_notifier_node *n);
> +void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> +			  int bytes);
>  #endif
> diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
> index de9b32f..0692cc6 100644
> --- a/arch/x86/kvm/page_track.c
> +++ b/arch/x86/kvm/page_track.c
> @@ -188,3 +188,70 @@ bool kvm_page_track_check_mode(struct kvm_vcpu *vcpu, gfn_t gfn,
>  
>  	return !!ACCESS_ONCE(slot->arch.gfn_track[mode][index]);
>  }
> +
> +void kvm_page_track_init(struct kvm *kvm)
> +{
> +	struct kvm_page_track_notifier_head *head;
> +
> +	head = &kvm->arch.track_notifier_head;
> +	init_srcu_struct(&head->track_srcu);
> +	INIT_HLIST_HEAD(&head->track_notifier_list);
> +}
> +
> +/*
> + * register the notifier so that event interception for the tracked guest
> + * pages can be received.
> + */
> +void
> +kvm_page_track_register_notifier(struct kvm *kvm,
> +				 struct kvm_page_track_notifier_node *n)
> +{
> +	struct kvm_page_track_notifier_head *head;
> +
> +	head = &kvm->arch.track_notifier_head;
> +
> +	spin_lock(&kvm->mmu_lock);
> +	hlist_add_head_rcu(&n->node, &head->track_notifier_list);
> +	spin_unlock(&kvm->mmu_lock);
> +}
> +
> +/*
> + * stop receiving the event interception. It is the opposed operation of
> + * kvm_page_track_register_notifier().
> + */
> +void
> +kvm_page_track_unregister_notifier(struct kvm *kvm,
> +				   struct kvm_page_track_notifier_node *n)
> +{
> +	struct kvm_page_track_notifier_head *head;
> +
> +	head = &kvm->arch.track_notifier_head;
> +
> +	spin_lock(&kvm->mmu_lock);
> +	hlist_del_rcu(&n->node);
> +	spin_unlock(&kvm->mmu_lock);
> +	synchronize_srcu(&head->track_srcu);
> +}
> +
> +/*
> + * Notify the node that write access is intercepted and write emulation is
> + * finished at this time.
> + *
> + * The node should figure out if the written page is the one that node is
> + * interested in by itself.
> + */
> +void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> +			  int bytes)
> +{
> +	struct kvm_page_track_notifier_head *head;
> +	struct kvm_page_track_notifier_node *n;
> +	int idx;
> +
> +	head = &vcpu->kvm->arch.track_notifier_head;

Please check outside SRCU if the notifier list is empty.  If so, there
is no need to do the (relatively) expensive srcu_read_lock/unlock.

Paolo

> +	idx = srcu_read_lock(&head->track_srcu);
> +	hlist_for_each_entry_rcu(n, &head->track_notifier_list, node)
> +		if (n->track_write)
> +			n->track_write(vcpu, gpa, new, bytes);
> +	srcu_read_unlock(&head->track_srcu, idx);
> +}
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e25ebb7..98019b6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4370,6 +4370,7 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
>  	if (ret < 0)
>  		return 0;

A kvm_vcpu_mark_page_dirty is missing here, isn't it?  I can take care
of it, but it would be great if you double-checked this.  If so, that
should be fixed in stable kernels too.

Can you add a kvm_vcpu_note_page_write(vcpu, gpa, val, bytes) function
that takes care of calling kvm_vcpu_mark_page_dirty, kvm_mmu_pte_write
and kvm_page_track-write?

Thanks,

Paolo

>  	kvm_mmu_pte_write(vcpu, gpa, val, bytes);
> +	kvm_page_track_write(vcpu, gpa, val, bytes);
>  	return 1;
>  }
>  
> @@ -4628,6 +4629,7 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
>  
>  	kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
>  	kvm_mmu_pte_write(vcpu, gpa, new, bytes);
> +	kvm_page_track_write(vcpu, gpa, new, bytes);
>  
>  	return X86EMUL_CONTINUE;
>  
> @@ -7748,6 +7750,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn);
>  	INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);
>  
> +	kvm_page_track_init(kvm);
> +
>  	return 0;
>  }
>  
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 10/11] KVM: MMU: clear write-flooding on the fast path of tracked page
  2016-02-14 11:31 ` [PATCH v3 10/11] KVM: MMU: clear write-flooding on the fast path of tracked page Xiao Guangrong
@ 2016-02-19 11:55   ` Paolo Bonzini
  2016-02-23  4:36     ` Xiao Guangrong
  0 siblings, 1 reply; 39+ messages in thread
From: Paolo Bonzini @ 2016-02-19 11:55 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 14/02/2016 12:31, Xiao Guangrong wrote:
> If the page fault is caused by write access on write tracked page, the
> real shadow page walking is skipped, we lost the chance to clear write
> flooding for the page structure current vcpu is using
> 
> Fix it by locklessly waking shadow page table to clear write flooding
> on the shadow page structure out of mmu-lock. So that we change the
> count to atomic_t

Should this be moved earlier in the series, so that the issue never
surfaces?

Paolo

> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  2 +-
>  arch/x86/kvm/mmu.c              | 22 ++++++++++++++++++++--
>  arch/x86/kvm/paging_tmpl.h      |  4 +++-
>  3 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 282bc2f..254d103 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -277,7 +277,7 @@ struct kvm_mmu_page {
>  #endif
>  
>  	/* Number of writes since the last time traversal visited this page.  */
> -	int write_flooding_count;
> +	atomic_t write_flooding_count;
>  };
>  
>  struct kvm_pio_request {
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 4986615..f924e6c 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2073,7 +2073,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
>  
>  static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
>  {
> -	sp->write_flooding_count = 0;
> +	atomic_set(&sp->write_flooding_count,  0);
>  }
>  
>  static void clear_sp_write_flooding_count(u64 *spte)
> @@ -3407,6 +3407,23 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
>  	return false;
>  }
>  
> +static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
> +{
> +	struct kvm_shadow_walk_iterator iterator;
> +	u64 spte;
> +
> +	if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
> +		return;
> +
> +	walk_shadow_page_lockless_begin(vcpu);
> +	for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
> +		clear_sp_write_flooding_count(iterator.sptep);
> +		if (!is_shadow_present_pte(spte))
> +			break;
> +	}
> +	walk_shadow_page_lockless_end(vcpu);
> +}
> +
>  static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
>  				u32 error_code, bool prefault)
>  {
> @@ -4236,7 +4253,8 @@ static bool detect_write_flooding(struct kvm_mmu_page *sp)
>  	if (sp->role.level == PT_PAGE_TABLE_LEVEL)
>  		return false;
>  
> -	return ++sp->write_flooding_count >= 3;
> +	atomic_inc(&sp->write_flooding_count);
> +	return atomic_read(&sp->write_flooding_count) >= 3;
>  }
>  
>  /*
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index c3a30c2..5985156 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -735,8 +735,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
>  		return 0;
>  	}
>  
> -	if (page_fault_handle_page_track(vcpu, error_code, walker.gfn))
> +	if (page_fault_handle_page_track(vcpu, error_code, walker.gfn)) {
> +		shadow_page_table_clear_flood(vcpu, addr);
>  		return 1;
> +	}
>  
>  	vcpu->arch.write_fault_to_shadow_pgtable = false;
>  
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 11/11] KVM: MMU: apply page track notifier
  2016-02-14 11:31 ` [PATCH v3 11/11] KVM: MMU: apply page track notifier Xiao Guangrong
@ 2016-02-19 11:56   ` Paolo Bonzini
  2016-02-23  4:40     ` Xiao Guangrong
  0 siblings, 1 reply; 39+ messages in thread
From: Paolo Bonzini @ 2016-02-19 11:56 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 14/02/2016 12:31, Xiao Guangrong wrote:
> Register the notifier to receive write track event so that we can update
> our shadow page table
> 
> It makes kvm_mmu_pte_write() be the callback of the notifier, no function
> is changed
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  5 +++--
>  arch/x86/kvm/mmu.c              | 19 +++++++++++++++++--
>  arch/x86/kvm/x86.c              |  4 ++--
>  3 files changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 254d103..5246f07 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -696,6 +696,7 @@ struct kvm_arch {
>  	 */
>  	struct list_head active_mmu_pages;
>  	struct list_head zapped_obsolete_pages;
> +	struct kvm_page_track_notifier_node mmu_sp_tracker;
>  	struct kvm_page_track_notifier_head track_notifier_head;
>  
>  	struct list_head assigned_dev_head;
> @@ -994,6 +995,8 @@ void kvm_mmu_module_exit(void);
>  void kvm_mmu_destroy(struct kvm_vcpu *vcpu);
>  int kvm_mmu_create(struct kvm_vcpu *vcpu);
>  void kvm_mmu_setup(struct kvm_vcpu *vcpu);
> +void kvm_mmu_init_vm(struct kvm *kvm);
> +void kvm_mmu_uninit_vm(struct kvm *kvm);
>  void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
>  		u64 dirty_mask, u64 nx_mask, u64 x_mask);
>  
> @@ -1133,8 +1136,6 @@ void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);
>  
>  void kvm_inject_nmi(struct kvm_vcpu *vcpu);
>  
> -void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> -		       const u8 *new, int bytes);
>  int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
>  int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva);
>  void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index f924e6c..57cf30b 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -4316,8 +4316,8 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
>  	return spte;
>  }
>  
> -void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> -		       const u8 *new, int bytes)
> +static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> +			      const u8 *new, int bytes)
>  {
>  	gfn_t gfn = gpa >> PAGE_SHIFT;
>  	struct kvm_mmu_page *sp;
> @@ -4531,6 +4531,21 @@ void kvm_mmu_setup(struct kvm_vcpu *vcpu)
>  	init_kvm_mmu(vcpu);
>  }
>  
> +void kvm_mmu_init_vm(struct kvm *kvm)
> +{
> +	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
> +
> +	node->track_write = kvm_mmu_pte_write;
> +	kvm_page_track_register_notifier(kvm, node);
> +}
> +
> +void kvm_mmu_uninit_vm(struct kvm *kvm)
> +{
> +	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
> +
> +	kvm_page_track_unregister_notifier(kvm, node);
> +}
> +
>  /* The return value indicates if tlb flush on all vcpus is needed. */
>  typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head);
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 98019b6..319d572 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4369,7 +4369,6 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
>  	ret = kvm_vcpu_write_guest(vcpu, gpa, val, bytes);
>  	if (ret < 0)
>  		return 0;
> -	kvm_mmu_pte_write(vcpu, gpa, val, bytes);
>  	kvm_page_track_write(vcpu, gpa, val, bytes);
>  	return 1;
>  }
> @@ -4628,7 +4627,6 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
>  		return X86EMUL_CMPXCHG_FAILED;
>  
>  	kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
> -	kvm_mmu_pte_write(vcpu, gpa, new, bytes);
>  	kvm_page_track_write(vcpu, gpa, new, bytes);
>  
>  	return X86EMUL_CONTINUE;
> @@ -7751,6 +7749,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);
>  
>  	kvm_page_track_init(kvm);
> +	kvm_mmu_init_vm(kvm);
>  
>  	return 0;
>  }
> @@ -7878,6 +7877,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>  	kfree(kvm->arch.vioapic);
>  	kvm_free_vcpus(kvm);
>  	kfree(rcu_dereference_check(kvm->arch.apic_map, 1));
> +	kvm_mmu_uninit_vm(kvm);

This function is not necessary, since the VM is disappearing anyway and
the page tracker is not going to be called.

Paolo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 00/11] KVM: x86: track guest page access
  2016-02-14 11:31 [PATCH v3 00/11] KVM: x86: track guest page access Xiao Guangrong
                   ` (10 preceding siblings ...)
  2016-02-14 11:31 ` [PATCH v3 11/11] KVM: MMU: apply page track notifier Xiao Guangrong
@ 2016-02-19 12:00 ` Paolo Bonzini
  2016-02-22 10:05   ` Xiao Guangrong
  11 siblings, 1 reply; 39+ messages in thread
From: Paolo Bonzini @ 2016-02-19 12:00 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song,
	Andrea Arcangeli



On 14/02/2016 12:31, Xiao Guangrong wrote:
> Changelong in v3:
> - refine the code of mmu_need_write_protect() based on Huang Kai's suggestion
> - rebase the patchset against current code
> 
> Changelog in v2:
> - fix a issue that the track memory of memslot is freed if we only move
>   the memslot or change the flags of memslot
> - do not track the gfn which is not mapped in memslots
> - introduce the nolock APIs at the begin of the patchset
> - use 'unsigned short' as the track counter to reduce the memory and which
>   should be enough for shadow page table and KVMGT
> 
> This patchset introduces the feature which allows us to track page
> access in guest. Currently, only write access tracking is implemented
> in this version.
> 
> Four APIs are introduces:
> - kvm_page_track_add_page(kvm, gfn, mode), single guest page @gfn is
>   added into the track pool of the guest instance represented by @kvm,
>   @mode specifies which kind of access on the @gfn is tracked
>   
> - kvm_page_track_remove_page(kvm, gfn, mode), is the opposed operation
>   of kvm_page_track_add_page() which removes @gfn from the tracking pool.
>   gfn is no tracked after its last user is gone
> 
> - kvm_page_track_register_notifier(kvm, n), register a notifier so that
>   the event triggered by page tracking will be received, at that time,
>   the callback of n->track_write() will be called
> 
> - kvm_page_track_unregister_notifier(kvm, n), does the opposed operation
>   of kvm_page_track_register_notifier(), which unlinks the notifier and
>   stops receiving the tracked event
> 
> The first user of page track is non-leaf shadow page tables as they are
> always write protected. It also gains performance improvement because
> page track speeds up page fault handler for the tracked pages. The
> performance result of kernel building is as followings:
> 
>    before           after
> real 461.63       real 455.48
> user 4529.55      user 4557.88
> sys 1995.39       sys 1922.57
> 
> Furthermore, it is the infrastructure of other kind of shadow page table,
> such as GPU shadow page table introduced in KVMGT (1) and native nested
> IOMMU.
> 
> This patch can be divided into two parts:
> - patch 1 ~ patch 7, implement page tracking
> - others patches apply page tracking to non-leaf shadow page table

Xiao,

the patches are very readable and very good.  My comments are only minor.

I still have a doubt: how are you going to handle invalidation of GPU
shadow page tables if a device (emulated in QEMU or even vhost) does DMA
to the PPGTT?  Generally, this was the reason to keep stuff out of KVM
and instead hook into the kernel mm subsystem (as with userfaultfd).

Paolo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 00/11] KVM: x86: track guest page access
  2016-02-19 12:00 ` [PATCH v3 00/11] KVM: x86: track guest page access Paolo Bonzini
@ 2016-02-22 10:05   ` Xiao Guangrong
  2016-02-23  3:02     ` Jike Song
  0 siblings, 1 reply; 39+ messages in thread
From: Xiao Guangrong @ 2016-02-22 10:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song,
	Andrea Arcangeli



On 02/19/2016 08:00 PM, Paolo Bonzini wrote:
>
>
> On 14/02/2016 12:31, Xiao Guangrong wrote:
>> Changelong in v3:
>> - refine the code of mmu_need_write_protect() based on Huang Kai's suggestion
>> - rebase the patchset against current code
>>
>> Changelog in v2:
>> - fix a issue that the track memory of memslot is freed if we only move
>>    the memslot or change the flags of memslot
>> - do not track the gfn which is not mapped in memslots
>> - introduce the nolock APIs at the begin of the patchset
>> - use 'unsigned short' as the track counter to reduce the memory and which
>>    should be enough for shadow page table and KVMGT
>>
>> This patchset introduces the feature which allows us to track page
>> access in guest. Currently, only write access tracking is implemented
>> in this version.
>>
>> Four APIs are introduces:
>> - kvm_page_track_add_page(kvm, gfn, mode), single guest page @gfn is
>>    added into the track pool of the guest instance represented by @kvm,
>>    @mode specifies which kind of access on the @gfn is tracked
>>
>> - kvm_page_track_remove_page(kvm, gfn, mode), is the opposed operation
>>    of kvm_page_track_add_page() which removes @gfn from the tracking pool.
>>    gfn is no tracked after its last user is gone
>>
>> - kvm_page_track_register_notifier(kvm, n), register a notifier so that
>>    the event triggered by page tracking will be received, at that time,
>>    the callback of n->track_write() will be called
>>
>> - kvm_page_track_unregister_notifier(kvm, n), does the opposed operation
>>    of kvm_page_track_register_notifier(), which unlinks the notifier and
>>    stops receiving the tracked event
>>
>> The first user of page track is non-leaf shadow page tables as they are
>> always write protected. It also gains performance improvement because
>> page track speeds up page fault handler for the tracked pages. The
>> performance result of kernel building is as followings:
>>
>>     before           after
>> real 461.63       real 455.48
>> user 4529.55      user 4557.88
>> sys 1995.39       sys 1922.57
>>
>> Furthermore, it is the infrastructure of other kind of shadow page table,
>> such as GPU shadow page table introduced in KVMGT (1) and native nested
>> IOMMU.
>>
>> This patch can be divided into two parts:
>> - patch 1 ~ patch 7, implement page tracking
>> - others patches apply page tracking to non-leaf shadow page table
>
> Xiao,
>
> the patches are very readable and very good.  My comments are only minor.

Thank you, Paolo!

>
> I still have a doubt: how are you going to handle invalidation of GPU
> shadow page tables if a device (emulated in QEMU or even vhost) does DMA
> to the PPGTT?

I think Jike is the better one to answer this question, Jike, could you
please clarify it? :)

> Generally, this was the reason to keep stuff out of KVM
> and instead hook into the kernel mm subsystem (as with userfaultfd).

We considered it carefully but this way can not satisfy KVMGT's requirements.
The reasons i explained in the old thread (https://lkml.org/lkml/2015/12/1/516)
are:

"For the performance, shadow GPU is performance critical and requires
frequently being switched, it is not good to handle it in userspace. And
windows guest has many GPU tables and updates it frequently, that means,
we need to write protect huge number of pages which are single page based,
I am afraid userfaultfd can not handle this case efficiently.

For the functionality, userfaultfd can not fill the need of shadow page
because:
- the page is keeping readonly, userfaultfd can not fix the fault and let
    the vcpu progress (write access causes writeable gup).

- the access need to be emulated, however, userfaultfd/kernel does not have
    the ability to emulate the access as the access is trigged by guest, the
    instruction info is stored in VMCS so that only KVM can emulate it.

- shadow page needs to be notified after the emulation is finished as it
    should know the new data written to the page to update its page hierarchy.
    (some hardwares lack the 'retry' ability so the shadow page table need to
     reflect the table in guest at any time). "

Any idea?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 00/11] KVM: x86: track guest page access
  2016-02-22 10:05   ` Xiao Guangrong
@ 2016-02-23  3:02     ` Jike Song
  2016-02-23  5:44       ` Tian, Kevin
  2016-02-23 10:01       ` Paolo Bonzini
  0 siblings, 2 replies; 39+ messages in thread
From: Jike Song @ 2016-02-23  3:02 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Paolo Bonzini, gleb, mtosatti, kvm, linux-kernel, kai.huang,
	Andrea Arcangeli, Tian, Kevin

+Kevin

On 02/22/2016 06:05 PM, Xiao Guangrong wrote:
> 
> On 02/19/2016 08:00 PM, Paolo Bonzini wrote:
>>
>> I still have a doubt: how are you going to handle invalidation of GPU
>> shadow page tables if a device (emulated in QEMU or even vhost) does DMA
>> to the PPGTT?
> 
> I think Jike is the better one to answer this question, Jike, could you
> please clarify it? :)
> 

Sure :)

Actually in guest PPGTT is manipulated by CPU rather than GPU. The
PPGTT page table itself are plain memory, composed & modified by the
GPU driver, i.e. by CPU in Non-Root mode.

Given that, we write-protected guest PPGTT, when VM writes PPGTT, EPT
violation rather than DMA fault happens.

>> Generally, this was the reason to keep stuff out of KVM
>> and instead hook into the kernel mm subsystem (as with userfaultfd).
> 
> We considered it carefully but this way can not satisfy KVMGT's requirements.
> The reasons i explained in the old thread (https://lkml.org/lkml/2015/12/1/516)
> are:
> 
> "For the performance, shadow GPU is performance critical and requires
> frequently being switched, it is not good to handle it in userspace. And
> windows guest has many GPU tables and updates it frequently, that means,
> we need to write protect huge number of pages which are single page based,
> I am afraid userfaultfd can not handle this case efficiently.
> 
> For the functionality, userfaultfd can not fill the need of shadow page
> because:
> - the page is keeping readonly, userfaultfd can not fix the fault and let
>     the vcpu progress (write access causes writeable gup).
> 
> - the access need to be emulated, however, userfaultfd/kernel does not have
>     the ability to emulate the access as the access is trigged by guest, the
>     instruction info is stored in VMCS so that only KVM can emulate it.
> 
> - shadow page needs to be notified after the emulation is finished as it
>     should know the new data written to the page to update its page hierarchy.
>     (some hardwares lack the 'retry' ability so the shadow page table need to
>      reflect the table in guest at any time). "
> 
> Any idea?
> 

--
Thanks,
Jike

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 04/11] KVM: page track: add the framework of guest page tracking
  2016-02-19 11:24   ` Paolo Bonzini
@ 2016-02-23  3:57     ` Xiao Guangrong
  0 siblings, 0 replies; 39+ messages in thread
From: Xiao Guangrong @ 2016-02-23  3:57 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 02/19/2016 07:24 PM, Paolo Bonzini wrote:
>
>
> On 14/02/2016 12:31, Xiao Guangrong wrote:

>>   #define KVM_MAX_VCPUS 255
>>   #define KVM_SOFT_MAX_VCPUS 160
>> @@ -650,6 +651,7 @@ struct kvm_lpage_info {
>>   struct kvm_arch_memory_slot {
>>   	struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES];
>>   	struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
>> +	unsigned short *gfn_track[KVM_PAGE_TRACK_MAX];
>
> Please add a comment at struct kvm_mmu_page_role mentioning that the
> number of role bits for shadow pages (i.e. not counting direct and
> invalid) must not exceed 15 (16 thoretically risks overflow already!),
> and counting the 14 bits that are in use.
>

Okay, good idea, it can avoid the potential issue in the future development,
will do it in the next version.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 05/11] KVM: page track: introduce kvm_page_track_{add,remove}_page
  2016-02-19 11:37   ` Paolo Bonzini
@ 2016-02-23  4:18     ` Xiao Guangrong
  2016-02-23 14:15       ` Paolo Bonzini
  0 siblings, 1 reply; 39+ messages in thread
From: Xiao Guangrong @ 2016-02-23  4:18 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 02/19/2016 07:37 PM, Paolo Bonzini wrote:
>
>
> On 14/02/2016 12:31, Xiao Guangrong wrote:
>> +	/* does tracking count wrap? */
>> +	WARN_ON((count > 0) && (val + count < val));
>
> This doesn't work, because "val + count" is an int.

val is 'unsigned short val' and count is 'short', so
'val + count' is not int...

>
>> +	/* the last tracker has already gone? */
>> +	WARN_ON((count < 0) && (val < !count));
>
> Also, here any underflow should warn.
>
> You can actually use the fact that val + count is an int like this:
>
>      WARN_ON(val + count < 0 || val + count > USHRT_MAX)
>

It looks nice, i will change the type of val to int to simplify the
code.

> and also please return if the warning fires.
>

Okay.

>> +void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
>> +			     enum kvm_page_track_mode mode)
>> +{
>> +	struct kvm_memslots *slots;
>> +	struct kvm_memory_slot *slot;
>> +	int i;
>> +
>> +	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>> +		slots = __kvm_memslots(kvm, i);
>> +
>> +		slot = __gfn_to_memslot(slots, gfn);
>> +		if (!slot)
>> +			continue;
>> +
>> +		spin_lock(&kvm->mmu_lock);
>> +		kvm_slot_page_track_add_page_nolock(kvm, slot, gfn, mode);
>> +		spin_unlock(&kvm->mmu_lock);
>> +	}
>> +}
>
> I don't think it is right to walk all address spaces.  The good news is

Then we can not track the page in SMM mode, but i think it is not a big
problem as SMM is invisible to OS (it is expected to not hurting OS) and
current shadow page in normal spaces can not reflect the changes happend
in SMM either. So it is okay to only take normal space into account.

> that you're not using kvm_page_track_{add,remove}_page at all as far as
> I can see, so you can just remove them.

kvm_page_track_{add,remove}_page, which hides the mmu specifics (e.g. slot,
mmu-lock, etc.) and are exported as APIs for other users, are just the
small wrappers of kvm_slot_page_track_{add,remove}_page_nolock and the
nolock functions are used in the later patch.

If you think it is not a good time to export these APIs, i am okay to export
_nolock functions only in the next version.

>
> Also, when you will need it, I think it's better to move the
> spin_lock/spin_unlock pair outside the for loop.  With this change,
> perhaps it's better to leave it to the caller completely---but I cannot
> say until I see the caller.

I will remove page tracking in SMM address space, so this is no loop in
the next version. ;)

>
> In the meanwhile, please leave out _nolock from the other functions' name.

I just wanted to warn the user that these functions are not safe as they
are not protected by mmu-lock. I will remove these hints if you dislike them.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 05/11] KVM: page track: introduce kvm_page_track_{add,remove}_page
  2016-02-19 11:37   ` Paolo Bonzini
@ 2016-02-23  4:18     ` Xiao Guangrong
  0 siblings, 0 replies; 39+ messages in thread
From: Xiao Guangrong @ 2016-02-23  4:18 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 02/19/2016 07:37 PM, Paolo Bonzini wrote:
>
>
> On 14/02/2016 12:31, Xiao Guangrong wrote:
>> +static bool check_mode(enum kvm_page_track_mode mode)
>> +{
>> +	if (mode < 0 || mode >= KVM_PAGE_TRACK_MAX)
>> +		return false;
>> +
>> +	return true;
>> +}
>
> Oops, forgot about this; please rename to page_track_mode_is_valid and
> make it "static inline".

Sure, it looks good to me. :)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 06/11] KVM: MMU: let page fault handler be aware tracked page
  2016-02-19 11:45   ` Paolo Bonzini
@ 2016-02-23  4:19     ` Xiao Guangrong
  0 siblings, 0 replies; 39+ messages in thread
From: Xiao Guangrong @ 2016-02-23  4:19 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 02/19/2016 07:45 PM, Paolo Bonzini wrote:
>
>
> On 14/02/2016 12:31, Xiao Guangrong wrote:
>> +/*
>> + * check if the corresponding access on the specified guest page is tracked.
>> + */
>> +bool kvm_page_track_check_mode(struct kvm_vcpu *vcpu, gfn_t gfn,
>> +			       enum kvm_page_track_mode mode)
>
> Please rename to kvm_page_track_is_active.

Got it! Will do it in the next version.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 07/11] KVM: page track: add notifier support
  2016-02-19 11:51   ` Paolo Bonzini
@ 2016-02-23  4:34     ` Xiao Guangrong
  2016-02-23 14:16       ` Paolo Bonzini
  0 siblings, 1 reply; 39+ messages in thread
From: Xiao Guangrong @ 2016-02-23  4:34 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 02/19/2016 07:51 PM, Paolo Bonzini wrote:
>
>
> On 14/02/2016 12:31, Xiao Guangrong wrote:

>> +void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
>> +			  int bytes)
>> +{
>> +	struct kvm_page_track_notifier_head *head;
>> +	struct kvm_page_track_notifier_node *n;
>> +	int idx;
>> +
>> +	head = &vcpu->kvm->arch.track_notifier_head;
>
> Please check outside SRCU if the notifier list is empty.  If so, there
> is no need to do the (relatively) expensive srcu_read_lock/unlock.
>

Good to me. I will check it by calling hlist_empty() first before holding
the srcu read lock.

> Paolo
>
>> +	idx = srcu_read_lock(&head->track_srcu);
>> +	hlist_for_each_entry_rcu(n, &head->track_notifier_list, node)
>> +		if (n->track_write)
>> +			n->track_write(vcpu, gpa, new, bytes);
>> +	srcu_read_unlock(&head->track_srcu, idx);
>> +}
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index e25ebb7..98019b6 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -4370,6 +4370,7 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
>>   	if (ret < 0)
>>   		return 0;
>
> A kvm_vcpu_mark_page_dirty is missing here, isn't it?  I can take care
> of it, but it would be great if you double-checked this.  If so, that
> should be fixed in stable kernels too.

No. It's already been handled in emulator_write_phys() -> kvm_vcpu_write_guest()
-> kvm_vcpu_write_guest_page() -> __kvm_write_guest_page().

>
> Can you add a kvm_vcpu_note_page_write(vcpu, gpa, val, bytes) function
> that takes care of calling kvm_vcpu_mark_page_dirty, kvm_mmu_pte_write
> and kvm_page_track-write?
>

After this patchset, kvm_mmu_pte_write is only a static notifier callback called
by kvm_page_track_write().

And the dirty tracking in emulator_write_phys() is handled in a public API (as my
explanation above), in emulator_cmpxchg_emulated is handled by itself. So i think
it is better to leaving dirty tracking to the separate paths, no? :)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 10/11] KVM: MMU: clear write-flooding on the fast path of tracked page
  2016-02-19 11:55   ` Paolo Bonzini
@ 2016-02-23  4:36     ` Xiao Guangrong
  0 siblings, 0 replies; 39+ messages in thread
From: Xiao Guangrong @ 2016-02-23  4:36 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 02/19/2016 07:55 PM, Paolo Bonzini wrote:
>
>
> On 14/02/2016 12:31, Xiao Guangrong wrote:
>> If the page fault is caused by write access on write tracked page, the
>> real shadow page walking is skipped, we lost the chance to clear write
>> flooding for the page structure current vcpu is using
>>
>> Fix it by locklessly waking shadow page table to clear write flooding
>> on the shadow page structure out of mmu-lock. So that we change the
>> count to atomic_t
>
> Should this be moved earlier in the series, so that the issue never
> surfaces?

Okay, i will move it to the place that is behind:
[PATCH v3 06/11] KVM: MMU: let page fault handler be aware tracked page

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 11/11] KVM: MMU: apply page track notifier
  2016-02-19 11:56   ` Paolo Bonzini
@ 2016-02-23  4:40     ` Xiao Guangrong
  2016-02-23 14:17       ` Paolo Bonzini
  0 siblings, 1 reply; 39+ messages in thread
From: Xiao Guangrong @ 2016-02-23  4:40 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 02/19/2016 07:56 PM, Paolo Bonzini wrote:
>
>
> On 14/02/2016 12:31, Xiao Guangrong wrote:
>> Register the notifier to receive write track event so that we can update
>> our shadow page table
>>
>> It makes kvm_mmu_pte_write() be the callback of the notifier, no function
>> is changed
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   arch/x86/include/asm/kvm_host.h |  5 +++--
>>   arch/x86/kvm/mmu.c              | 19 +++++++++++++++++--
>>   arch/x86/kvm/x86.c              |  4 ++--
>>   3 files changed, 22 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 254d103..5246f07 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -696,6 +696,7 @@ struct kvm_arch {
>>   	 */
>>   	struct list_head active_mmu_pages;
>>   	struct list_head zapped_obsolete_pages;
>> +	struct kvm_page_track_notifier_node mmu_sp_tracker;
>>   	struct kvm_page_track_notifier_head track_notifier_head;
>>
>>   	struct list_head assigned_dev_head;
>> @@ -994,6 +995,8 @@ void kvm_mmu_module_exit(void);
>>   void kvm_mmu_destroy(struct kvm_vcpu *vcpu);
>>   int kvm_mmu_create(struct kvm_vcpu *vcpu);
>>   void kvm_mmu_setup(struct kvm_vcpu *vcpu);
>> +void kvm_mmu_init_vm(struct kvm *kvm);
>> +void kvm_mmu_uninit_vm(struct kvm *kvm);
>>   void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
>>   		u64 dirty_mask, u64 nx_mask, u64 x_mask);
>>
>> @@ -1133,8 +1136,6 @@ void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);
>>
>>   void kvm_inject_nmi(struct kvm_vcpu *vcpu);
>>
>> -void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
>> -		       const u8 *new, int bytes);
>>   int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
>>   int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva);
>>   void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index f924e6c..57cf30b 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -4316,8 +4316,8 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
>>   	return spte;
>>   }
>>
>> -void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
>> -		       const u8 *new, int bytes)
>> +static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
>> +			      const u8 *new, int bytes)
>>   {
>>   	gfn_t gfn = gpa >> PAGE_SHIFT;
>>   	struct kvm_mmu_page *sp;
>> @@ -4531,6 +4531,21 @@ void kvm_mmu_setup(struct kvm_vcpu *vcpu)
>>   	init_kvm_mmu(vcpu);
>>   }
>>
>> +void kvm_mmu_init_vm(struct kvm *kvm)
>> +{
>> +	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
>> +
>> +	node->track_write = kvm_mmu_pte_write;
>> +	kvm_page_track_register_notifier(kvm, node);
>> +}
>> +
>> +void kvm_mmu_uninit_vm(struct kvm *kvm)
>> +{
>> +	struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
>> +
>> +	kvm_page_track_unregister_notifier(kvm, node);
>> +}
>> +
>>   /* The return value indicates if tlb flush on all vcpus is needed. */
>>   typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head);
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 98019b6..319d572 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -4369,7 +4369,6 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
>>   	ret = kvm_vcpu_write_guest(vcpu, gpa, val, bytes);
>>   	if (ret < 0)
>>   		return 0;
>> -	kvm_mmu_pte_write(vcpu, gpa, val, bytes);
>>   	kvm_page_track_write(vcpu, gpa, val, bytes);
>>   	return 1;
>>   }
>> @@ -4628,7 +4627,6 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
>>   		return X86EMUL_CMPXCHG_FAILED;
>>
>>   	kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
>> -	kvm_mmu_pte_write(vcpu, gpa, new, bytes);
>>   	kvm_page_track_write(vcpu, gpa, new, bytes);
>>
>>   	return X86EMUL_CONTINUE;
>> @@ -7751,6 +7749,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>   	INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);
>>
>>   	kvm_page_track_init(kvm);
>> +	kvm_mmu_init_vm(kvm);
>>
>>   	return 0;
>>   }
>> @@ -7878,6 +7877,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>>   	kfree(kvm->arch.vioapic);
>>   	kvm_free_vcpus(kvm);
>>   	kfree(rcu_dereference_check(kvm->arch.apic_map, 1));
>> +	kvm_mmu_uninit_vm(kvm);
>
> This function is not necessary, since the VM is disappearing anyway and
> the page tracker is not going to be called.

I think it is still necessary, as we are using srcu to protect the notifier, so
we should wait all the callers of notifier callbacks gone, i.e, synchronize_srcu()
is needed anyway.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v3 00/11] KVM: x86: track guest page access
  2016-02-23  3:02     ` Jike Song
@ 2016-02-23  5:44       ` Tian, Kevin
  2016-02-23 12:13         ` Paolo Bonzini
  2016-02-23 10:01       ` Paolo Bonzini
  1 sibling, 1 reply; 39+ messages in thread
From: Tian, Kevin @ 2016-02-23  5:44 UTC (permalink / raw)
  To: Song, Jike, Xiao Guangrong
  Cc: Paolo Bonzini, gleb, mtosatti, kvm, linux-kernel, kai.huang,
	Andrea Arcangeli, Alex Williamson (alex.williamson@redhat.com)

> From: Song, Jike
> Sent: Tuesday, February 23, 2016 11:02 AM
> 
> +Kevin
> 
> On 02/22/2016 06:05 PM, Xiao Guangrong wrote:
> >
> > On 02/19/2016 08:00 PM, Paolo Bonzini wrote:
> >>
> >> I still have a doubt: how are you going to handle invalidation of GPU
> >> shadow page tables if a device (emulated in QEMU or even vhost) does DMA
> >> to the PPGTT?
> >
> > I think Jike is the better one to answer this question, Jike, could you
> > please clarify it? :)
> >
> 
> Sure :)
> 
> Actually in guest PPGTT is manipulated by CPU rather than GPU. The
> PPGTT page table itself are plain memory, composed & modified by the
> GPU driver, i.e. by CPU in Non-Root mode.
> 
> Given that, we write-protected guest PPGTT, when VM writes PPGTT, EPT
> violation rather than DMA fault happens.

'DMA to PPGTT' is NOT SUPPORTED on our vGPU device model. Today
Intel gfx driver doesn't use this method, and we explicitly list it as a
guest driver requirement to support a vGPU. If a malicious driver does 
program DMA to modify PPGTT, it can only modify guest PPGTT instead
of shadow PPGTT (being guest invisible). So there is no security issue 
either.

> 
> >> Generally, this was the reason to keep stuff out of KVM
> >> and instead hook into the kernel mm subsystem (as with userfaultfd).
> >
> > We considered it carefully but this way can not satisfy KVMGT's requirements.
> > The reasons i explained in the old thread (https://lkml.org/lkml/2015/12/1/516)
> > are:
> >
> > "For the performance, shadow GPU is performance critical and requires
> > frequently being switched, it is not good to handle it in userspace. And
> > windows guest has many GPU tables and updates it frequently, that means,
> > we need to write protect huge number of pages which are single page based,
> > I am afraid userfaultfd can not handle this case efficiently.

Yes, performance is the main concern. 

Paolo, we explained the reason for in-kernel emulation to you earlier with
your understanding:
----
> > It's definitely a fast path, e.g. command submission, shadow GPU page
> > table, etc. which are all in performance critical path. Another reason is
> > the I/O access frequency, which could be up to 100k/s for some gfx workload.
> > It's important to shorten the emulation path which can help performance
> > a lot. That's the major reason why we keep vGPU device model in the
> > kernel (will merged into i915 driver)
> 
> Ok, thanks---writing numbers down always helps.  MMIO to userspace costs
> 5000 clock cycles on the latest QEMU and processor (and does not need
> the "big QEMU lock" anymore), but still 100k/s is a ~500000 clock cycle
> difference and approximately 15% host CPU usage.
----
(I believe ~500000 should be ~500M clock cycle above)

> >
> > For the functionality, userfaultfd can not fill the need of shadow page
> > because:
> > - the page is keeping readonly, userfaultfd can not fix the fault and let
> >     the vcpu progress (write access causes writeable gup).
> >
> > - the access need to be emulated, however, userfaultfd/kernel does not have
> >     the ability to emulate the access as the access is trigged by guest, the
> >     instruction info is stored in VMCS so that only KVM can emulate it.
> >
> > - shadow page needs to be notified after the emulation is finished as it
> >     should know the new data written to the page to update its page hierarchy.
> >     (some hardwares lack the 'retry' ability so the shadow page table need to
> >      reflect the table in guest at any time). "
> >
> > Any idea?
> >
> 

Thanks Guangrong for investigating the possibility.

Based on earlier explanation, we hope KVM community can re-think the
necessity of support in-kernel emulation for KVMGT. Same framework
might be extended to other type of I/O devices using similar mediated
pass-through concept in the future, which has device model tightly
integrated with native device driver for efficiency and simplicity purpose.

Actually a related open when discussing KVMGT/VFIO integration.
There are 7 total services required to support in-kernel emulation, which 
can be categorize into two groups:

a) services to connect vGPU with VM, which are essentially what a device
driver is doing (so VFIO can fit here), including:
	1) Selectively pass-through a region to a VM
	2) Trap-and-emulate a region
	3) Inject a virtual interrupt
	4) Pin/unpin guest memory
	5) GPA->IOVA/HVA translation (as a side-effect)

b) services to support device emulation, which gonna be hypervisor
specific, including:
	6) Map/unmap guest memory
	7) Write-protect a guest memory page

We're working with VFIO community to add support of category a),
but there is still a gap in category b). This patch series can address
the requirement of 7). For 6) it's straightforward for KVM. We may
introduce a new file in KVM to wrap them together for in-kernel
emulation, but need an agreement from community first on this
direction. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 00/11] KVM: x86: track guest page access
  2016-02-23  3:02     ` Jike Song
  2016-02-23  5:44       ` Tian, Kevin
@ 2016-02-23 10:01       ` Paolo Bonzini
  2016-02-23 11:50         ` Jike Song
  1 sibling, 1 reply; 39+ messages in thread
From: Paolo Bonzini @ 2016-02-23 10:01 UTC (permalink / raw)
  To: Jike Song
  Cc: Xiao Guangrong, gleb, mtosatti, kvm, linux-kernel, kai huang,
	Andrea Arcangeli, Kevin Tian



----- Original Message -----
> From: "Jike Song" <jike.song@intel.com>
> To: "Xiao Guangrong" <guangrong.xiao@linux.intel.com>
> Cc: "Paolo Bonzini" <pbonzini@redhat.com>, gleb@kernel.org, mtosatti@redhat.com, kvm@vger.kernel.org,
> linux-kernel@vger.kernel.org, "kai huang" <kai.huang@linux.intel.com>, "Andrea Arcangeli" <aarcange@redhat.com>,
> "Kevin Tian" <kevin.tian@intel.com>
> Sent: Tuesday, February 23, 2016 4:02:25 AM
> Subject: Re: [PATCH v3 00/11] KVM: x86: track guest page access
> 
> +Kevin
> 
> On 02/22/2016 06:05 PM, Xiao Guangrong wrote:
> > 
> > On 02/19/2016 08:00 PM, Paolo Bonzini wrote:
> >>
> >> I still have a doubt: how are you going to handle invalidation of GPU
> >> shadow page tables if a device (emulated in QEMU or even vhost) does DMA
> >> to the PPGTT?
> > 
> > I think Jike is the better one to answer this question, Jike, could you
> > please clarify it? :)
> > 
> 
> Sure :)
> 
> Actually in guest PPGTT is manipulated by CPU rather than GPU. The
> PPGTT page table itself are plain memory, composed & modified by the
> GPU driver, i.e. by CPU in Non-Root mode.
> 
> Given that, we write-protected guest PPGTT, when VM writes PPGTT, EPT
> violation rather than DMA fault happens.

I am not talking of DMA faults; I am talking of a guest that reads
from disk into the PPGTT.  This is emulated DMA, and your approach of
tracking guest page access from KVM means that you are not handling
this.  Is this right?  If so, what happens if the guest does this
kind of operation (for example because it is not using the PPGTT
anymore)?  KVMGT should not be confused the next time it works on
that PPGTT page.

Paolo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 00/11] KVM: x86: track guest page access
  2016-02-23 10:01       ` Paolo Bonzini
@ 2016-02-23 11:50         ` Jike Song
  0 siblings, 0 replies; 39+ messages in thread
From: Jike Song @ 2016-02-23 11:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiao Guangrong, gleb, mtosatti, kvm, linux-kernel, kai huang,
	Andrea Arcangeli, Kevin Tian

On 02/23/2016 06:01 PM, Paolo Bonzini wrote:
> ----- Original Message -----
>> From: "Jike Song" <jike.song@intel.com>
>> To: "Xiao Guangrong" <guangrong.xiao@linux.intel.com>
>> Cc: "Paolo Bonzini" <pbonzini@redhat.com>, gleb@kernel.org, mtosatti@redhat.com, kvm@vger.kernel.org,
>> linux-kernel@vger.kernel.org, "kai huang" <kai.huang@linux.intel.com>, "Andrea Arcangeli" <aarcange@redhat.com>,
>> "Kevin Tian" <kevin.tian@intel.com>
>> Sent: Tuesday, February 23, 2016 4:02:25 AM
>> Subject: Re: [PATCH v3 00/11] KVM: x86: track guest page access
>>
>> +Kevin
>>
>> On 02/22/2016 06:05 PM, Xiao Guangrong wrote:
>>>
>>> On 02/19/2016 08:00 PM, Paolo Bonzini wrote:
>>>>
>>>> I still have a doubt: how are you going to handle invalidation of GPU
>>>> shadow page tables if a device (emulated in QEMU or even vhost) does DMA
>>>> to the PPGTT?
>>>
>>> I think Jike is the better one to answer this question, Jike, could you
>>> please clarify it? :)
>>>
>>
>> Sure :)
>>
>> Actually in guest PPGTT is manipulated by CPU rather than GPU. The
>> PPGTT page table itself are plain memory, composed & modified by the
>> GPU driver, i.e. by CPU in Non-Root mode.
>>
>> Given that, we write-protected guest PPGTT, when VM writes PPGTT, EPT
>> violation rather than DMA fault happens.
> 

I may still misunderstand you, so apologize in advance ..

> I am not talking of DMA faults; I am talking of a guest that reads
> from disk into the PPGTT.

into PPGTT the page table itself? as said by Kevin in another mail,
this is NOT SUPPORTED.

> This is emulated DMA, and your approach of
> tracking guest page access from KVM means that you are not handling
> this.  Is this right?

Right, our tacking mechanism cares only CPU write, not Device write.

However, there is *NO* DMA emulation, just similar to passthrough.
The device(IGD) is only cable of r/w memory according the
shadowed PPGTT, which is managed by VGPU device-model, guaranteed
only memory that owned by this vgpu can be mapped. 

All we need is to track CPU writes from guest.

> If so, what happens if the guest does this
> kind of operation (for example because it is not using the PPGTT
> anymore)?  KVMGT should not be confused the next time it works on
> that PPGTT page.

As explained above, the device-model won't allow such things to happen.

> 
> Paolo
>

--
Thanks,
Jike

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 00/11] KVM: x86: track guest page access
  2016-02-23  5:44       ` Tian, Kevin
@ 2016-02-23 12:13         ` Paolo Bonzini
  0 siblings, 0 replies; 39+ messages in thread
From: Paolo Bonzini @ 2016-02-23 12:13 UTC (permalink / raw)
  To: Tian, Kevin, Song, Jike, Xiao Guangrong
  Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, Andrea Arcangeli,
	Alex Williamson (alex.williamson@redhat.com)



On 23/02/2016 06:44, Tian, Kevin wrote:
>> From: Song, Jike
>> Sent: Tuesday, February 23, 2016 11:02 AM
>>
>> +Kevin
>>
>> On 02/22/2016 06:05 PM, Xiao Guangrong wrote:
>>>
>>> On 02/19/2016 08:00 PM, Paolo Bonzini wrote:
>>>>
>>>> I still have a doubt: how are you going to handle invalidation of GPU
>>>> shadow page tables if a device (emulated in QEMU or even vhost) does DMA
>>>> to the PPGTT?
>>>
>>> I think Jike is the better one to answer this question, Jike, could you
>>> please clarify it? :)
>>>
>>
>> Sure :)
>>
>> Actually in guest PPGTT is manipulated by CPU rather than GPU. The
>> PPGTT page table itself are plain memory, composed & modified by the
>> GPU driver, i.e. by CPU in Non-Root mode.
>>
>> Given that, we write-protected guest PPGTT, when VM writes PPGTT, EPT
>> violation rather than DMA fault happens.
> 
> 'DMA to PPGTT' is NOT SUPPORTED on our vGPU device model. Today
> Intel gfx driver doesn't use this method, and we explicitly list it as a
> guest driver requirement to support a vGPU. If a malicious driver does 
> program DMA to modify PPGTT, it can only modify guest PPGTT instead
> of shadow PPGTT (being guest invisible). So there is no security issue 
> either.

Ok, thanks for confirming.

Paolo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 05/11] KVM: page track: introduce kvm_page_track_{add,remove}_page
  2016-02-23  4:18     ` Xiao Guangrong
@ 2016-02-23 14:15       ` Paolo Bonzini
  0 siblings, 0 replies; 39+ messages in thread
From: Paolo Bonzini @ 2016-02-23 14:15 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 23/02/2016 05:18, Xiao Guangrong wrote:
> 
> 
> On 02/19/2016 07:37 PM, Paolo Bonzini wrote:
>>
>>
>> On 14/02/2016 12:31, Xiao Guangrong wrote:
>>> +    /* does tracking count wrap? */
>>> +    WARN_ON((count > 0) && (val + count < val));
>>
>> This doesn't work, because "val + count" is an int.
> 
> val is 'unsigned short val' and count is 'short', so
> 'val + count' is not int...

Actually, it is.  "val" and "count" are both promoted to int, and the
result of "val + count" is an int!


>>> +void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
>>> +                 enum kvm_page_track_mode mode)
>>> +{
>>> +    struct kvm_memslots *slots;
>>> +    struct kvm_memory_slot *slot;
>>> +    int i;
>>> +
>>> +    for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>>> +        slots = __kvm_memslots(kvm, i);
>>> +
>>> +        slot = __gfn_to_memslot(slots, gfn);
>>> +        if (!slot)
>>> +            continue;
>>> +
>>> +        spin_lock(&kvm->mmu_lock);
>>> +        kvm_slot_page_track_add_page_nolock(kvm, slot, gfn, mode);
>>> +        spin_unlock(&kvm->mmu_lock);
>>> +    }
>>> +}
>>
>> I don't think it is right to walk all address spaces.  The good news is
> 
> Then we can not track the page in SMM mode, but i think it is not a big
> problem as SMM is invisible to OS (it is expected to not hurting OS) and
> current shadow page in normal spaces can not reflect the changes happend
> in SMM either. So it is okay to only take normal space into account.

I think which address space to track depends on the scenario where
you're using page tracking.  For example, in the shadow case you only
track either SMM or non-SMM depending on the CPU's mode.

For KVM-GT you probably need to track only non-SMM.

>> that you're not using kvm_page_track_{add,remove}_page at all as far as
>> I can see, so you can just remove them.
> 
> kvm_page_track_{add,remove}_page, which hides the mmu specifics (e.g. slot,
> mmu-lock, etc.) and are exported as APIs for other users, are just the
> small wrappers of kvm_slot_page_track_{add,remove}_page_nolock and the
> nolock functions are used in the later patch.
> 
> If you think it is not a good time to export these APIs, i am okay to
> export _nolock functions only in the next version.

Yes, please.

>> Also, when you will need it, I think it's better to move the
>> spin_lock/spin_unlock pair outside the for loop.  With this change,
>> perhaps it's better to leave it to the caller completely---but I cannot
>> say until I see the caller.
> 
> I will remove page tracking in SMM address space, so this is no loop in
> the next version. ;)

Instead please just remove the functions completely.  Functions without
a caller are unnecessary.

>> In the meanwhile, please leave out _nolock from the other functions'
>> name.
> 
> I just wanted to warn the user that these functions are not safe as they
> are not protected by mmu-lock. I will remove these hints if you dislike
> them.

I think there's several other functions that are not protected by
mmu-lock.  You can instead add kerneldoc comments and mention the
locking requirements there.

The common convention in the kernel is to have function take the lock
and call __function.  In this case however there is no "locked" function
yet; if it comes later, we will rename the functions to add "__" in front.

Paolo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 07/11] KVM: page track: add notifier support
  2016-02-23  4:34     ` Xiao Guangrong
@ 2016-02-23 14:16       ` Paolo Bonzini
  0 siblings, 0 replies; 39+ messages in thread
From: Paolo Bonzini @ 2016-02-23 14:16 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 23/02/2016 05:34, Xiao Guangrong wrote:
>>
>> A kvm_vcpu_mark_page_dirty is missing here, isn't it?  I can take care
>> of it, but it would be great if you double-checked this.  If so, that
>> should be fixed in stable kernels too.
> 
> No. It's already been handled in emulator_write_phys() ->
> kvm_vcpu_write_guest()
> -> kvm_vcpu_write_guest_page() -> __kvm_write_guest_page().

You're right...

>>
>> Can you add a kvm_vcpu_note_page_write(vcpu, gpa, val, bytes) function
>> that takes care of calling kvm_vcpu_mark_page_dirty, kvm_mmu_pte_write
>> and kvm_page_track-write?
>>
> 
> After this patchset, kvm_mmu_pte_write is only a static notifier
> callback called
> by kvm_page_track_write().
> 
> And the dirty tracking in emulator_write_phys() is handled in a public
> API (as my
> explanation above), in emulator_cmpxchg_emulated is handled by itself.
> So i think
> it is better to leaving dirty tracking to the separate paths, no? :)

... and here it is indeed better to leave things as they are in v3.  Thanks,

Paolo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 11/11] KVM: MMU: apply page track notifier
  2016-02-23  4:40     ` Xiao Guangrong
@ 2016-02-23 14:17       ` Paolo Bonzini
  0 siblings, 0 replies; 39+ messages in thread
From: Paolo Bonzini @ 2016-02-23 14:17 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: gleb, mtosatti, kvm, linux-kernel, kai.huang, jike.song



On 23/02/2016 05:40, Xiao Guangrong wrote:
>>>
>>> @@ -7878,6 +7877,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>>>       kfree(kvm->arch.vioapic);
>>>       kvm_free_vcpus(kvm);
>>>       kfree(rcu_dereference_check(kvm->arch.apic_map, 1));
>>> +    kvm_mmu_uninit_vm(kvm);
>>
>> This function is not necessary, since the VM is disappearing anyway and
>> the page tracker is not going to be called.
> 
> I think it is still necessary, as we are using srcu to protect the
> notifier, so
> we should wait all the callers of notifier callbacks gone, i.e,
> synchronize_srcu() is needed anyway.

You're right.

Paolo

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2016-02-23 14:17 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-14 11:31 [PATCH v3 00/11] KVM: x86: track guest page access Xiao Guangrong
2016-02-14 11:31 ` [PATCH v3 01/11] KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed Xiao Guangrong
2016-02-19 11:08   ` Paolo Bonzini
2016-02-14 11:31 ` [PATCH v3 02/11] KVM: MMU: introduce kvm_mmu_gfn_{allow,disallow}_lpage Xiao Guangrong
2016-02-19 11:09   ` Paolo Bonzini
2016-02-14 11:31 ` [PATCH v3 03/11] KVM: MMU: introduce kvm_mmu_slot_gfn_write_protect Xiao Guangrong
2016-02-19 11:18   ` Paolo Bonzini
2016-02-14 11:31 ` [PATCH v3 04/11] KVM: page track: add the framework of guest page tracking Xiao Guangrong
2016-02-19 11:24   ` Paolo Bonzini
2016-02-23  3:57     ` Xiao Guangrong
2016-02-14 11:31 ` [PATCH v3 05/11] KVM: page track: introduce kvm_page_track_{add,remove}_page Xiao Guangrong
2016-02-19 11:37   ` Paolo Bonzini
2016-02-23  4:18     ` Xiao Guangrong
2016-02-23 14:15       ` Paolo Bonzini
2016-02-19 11:37   ` Paolo Bonzini
2016-02-23  4:18     ` Xiao Guangrong
2016-02-14 11:31 ` [PATCH v3 06/11] KVM: MMU: let page fault handler be aware tracked page Xiao Guangrong
2016-02-19 11:45   ` Paolo Bonzini
2016-02-23  4:19     ` Xiao Guangrong
2016-02-14 11:31 ` [PATCH v3 07/11] KVM: page track: add notifier support Xiao Guangrong
2016-02-19 11:51   ` Paolo Bonzini
2016-02-23  4:34     ` Xiao Guangrong
2016-02-23 14:16       ` Paolo Bonzini
2016-02-14 11:31 ` [PATCH v3 08/11] KVM: MMU: use page track for non-leaf shadow pages Xiao Guangrong
2016-02-14 11:31 ` [PATCH v3 09/11] KVM: MMU: simplify mmu_need_write_protect Xiao Guangrong
2016-02-14 11:31 ` [PATCH v3 10/11] KVM: MMU: clear write-flooding on the fast path of tracked page Xiao Guangrong
2016-02-19 11:55   ` Paolo Bonzini
2016-02-23  4:36     ` Xiao Guangrong
2016-02-14 11:31 ` [PATCH v3 11/11] KVM: MMU: apply page track notifier Xiao Guangrong
2016-02-19 11:56   ` Paolo Bonzini
2016-02-23  4:40     ` Xiao Guangrong
2016-02-23 14:17       ` Paolo Bonzini
2016-02-19 12:00 ` [PATCH v3 00/11] KVM: x86: track guest page access Paolo Bonzini
2016-02-22 10:05   ` Xiao Guangrong
2016-02-23  3:02     ` Jike Song
2016-02-23  5:44       ` Tian, Kevin
2016-02-23 12:13         ` Paolo Bonzini
2016-02-23 10:01       ` Paolo Bonzini
2016-02-23 11:50         ` Jike Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).