* [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations
@ 2016-02-24 13:17 Paolo Bonzini
2016-02-24 13:17 ` [PATCH 01/12] KVM: MMU: Fix ubsan warnings Paolo Bonzini
` (13 more replies)
0 siblings, 14 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti
This series started from looking at mmu_unsync_walk for the ubsan thread.
Patches 1 and 2 are the result of the discussions in that thread.
Patches 3 to 9 do more cleanups in __kvm_sync_page and its callers.
Among other changes, it removes kvm_sync_page_transient and avoids
duplicate code between __kvm_sync_page and kvm_sync_pages.
I stopped where I had questions about the existing kvm_mmu_get_page
code (see patch 8 for the question). However perhaps more cleanups
are possible, also thanks to Takuya's work on that function and
link_shadow_page.
Patches 10 to 12 are just micro-optimizations.
Guangrong, it would be great if you took a look since you know this part
of KVM very well.
I have tested this series minus patch 9, and it survived installation
of various Linux and Windows guests with EPT disabled. Of course before
committing the patches I will retest with patch 9 included.
Paolo
Paolo Bonzini (11):
KVM: MMU: Fix ubsan warnings
KVM: MMU: introduce kvm_mmu_flush_or_zap
KVM: MMU: move TLB flush out of __kvm_sync_page
KVM: MMU: use kvm_sync_page in kvm_sync_pages
KVM: MMU: cleanup __kvm_sync_page and its callers
KVM: MMU: invert return value of FNAME(sync_page) and *kvm_sync_page*
KVM: MMU: move zap/flush to kvm_mmu_get_page
KVM: MMU: coalesce zapping page after mmu_sync_children
KVM: mark memory barrier with smp_mb__after_atomic
KVM: MMU: simplify last_pte_bitmap
KVM: MMU: micro-optimize gpte_access
Xiao Guangrong (1):
KVM: MMU: check kvm_mmu_pages and mmu_page_path indices
arch/x86/include/asm/kvm_host.h | 6 +-
arch/x86/kvm/mmu.c | 216 ++++++++++++++++++++++------------------
arch/x86/kvm/paging_tmpl.h | 11 +-
virt/kvm/kvm_main.c | 2 +-
4 files changed, 126 insertions(+), 109 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 01/12] KVM: MMU: Fix ubsan warnings
2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
2016-02-24 13:42 ` Mike Krinkin
2016-02-24 13:17 ` [PATCH 02/12] KVM: MMU: check kvm_mmu_pages and mmu_page_path indices Paolo Bonzini
` (12 subsequent siblings)
13 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti
kvm_mmu_pages_init is doing some really yucky stuff. It is setting
up a sentinel for mmu_page_clear_parents; however, because of a) the
way levels are numbered starting from 1 and b) the way mmu_page_path
sizes its arrays with PT64_ROOT_LEVEL-1 elements, the access can be
out of bounds. This is harmless because the code overwrites up to the
first two elements of parents->idx and these are initialized, and
because the sentinel is not needed in this case---mmu_page_clear_parents
exits anyway when it gets to the end of the array. However ubsan
complains, and everyone else should too.
This fix does three things. First it makes the mmu_page_path arrays
PT64_ROOT_LEVEL elements in size, so that we can write to them without
checking the level in advance. Second it disintegrates kvm_mmu_pages_init
between mmu_unsync_walk (to reset the struct kvm_mmu_pages) and
for_each_sp (to place the NULL sentinel at the end of the current path).
This is okay because the mmu_page_path is only used in
mmu_pages_clear_parents; mmu_pages_clear_parents itself is called within
a for_each_sp iterator, and hence always after a call to mmu_pages_next.
Third it changes mmu_pages_clear_parents to just use the sentinel to
stop iteration, without checking the bounds on level.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: Mike Krinkin <krinkin.m.u@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/mmu.c | 57 +++++++++++++++++++++++++++++++-----------------------
1 file changed, 33 insertions(+), 24 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 07f4c26a10d3..4dee855897cf 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1843,6 +1843,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
static int mmu_unsync_walk(struct kvm_mmu_page *sp,
struct kvm_mmu_pages *pvec)
{
+ pvec->nr = 0;
if (!sp->unsync_children)
return 0;
@@ -1956,13 +1957,12 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
}
struct mmu_page_path {
- struct kvm_mmu_page *parent[PT64_ROOT_LEVEL-1];
- unsigned int idx[PT64_ROOT_LEVEL-1];
+ struct kvm_mmu_page *parent[PT64_ROOT_LEVEL];
+ unsigned int idx[PT64_ROOT_LEVEL];
};
#define for_each_sp(pvec, sp, parents, i) \
- for (i = mmu_pages_next(&pvec, &parents, -1), \
- sp = pvec.page[i].sp; \
+ for (i = mmu_pages_first(&pvec, &parents); \
i < pvec.nr && ({ sp = pvec.page[i].sp; 1;}); \
i = mmu_pages_next(&pvec, &parents, i))
@@ -1974,19 +1974,41 @@ static int mmu_pages_next(struct kvm_mmu_pages *pvec,
for (n = i+1; n < pvec->nr; n++) {
struct kvm_mmu_page *sp = pvec->page[n].sp;
+ unsigned idx = pvec->page[n].idx;
+ int level = sp->role.level;
- if (sp->role.level == PT_PAGE_TABLE_LEVEL) {
- parents->idx[0] = pvec->page[n].idx;
- return n;
- }
+ parents->idx[level-1] = idx;
+ if (level == PT_PAGE_TABLE_LEVEL)
+ break;
- parents->parent[sp->role.level-2] = sp;
- parents->idx[sp->role.level-1] = pvec->page[n].idx;
+ parents->parent[level-2] = sp;
}
return n;
}
+static int mmu_pages_first(struct kvm_mmu_pages *pvec,
+ struct mmu_page_path *parents)
+{
+ struct kvm_mmu_page *sp;
+ int level;
+
+ if (pvec->nr == 0)
+ return 0;
+
+ sp = pvec->page[0].sp;
+ level = sp->role.level;
+ WARN_ON(level == PT_PAGE_TABLE_LEVEL);
+
+ parents->parent[level-2] = sp;
+
+ /* Also set up a sentinel. Further entries in pvec are all
+ * children of sp, so this element is never overwritten.
+ */
+ parents->parent[level-1] = NULL;
+ return mmu_pages_next(pvec, parents, 0);
+}
+
static void mmu_pages_clear_parents(struct mmu_page_path *parents)
{
struct kvm_mmu_page *sp;
@@ -1994,22 +2016,13 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
do {
unsigned int idx = parents->idx[level];
-
sp = parents->parent[level];
if (!sp)
return;
clear_unsync_child_bit(sp, idx);
level++;
- } while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children);
-}
-
-static void kvm_mmu_pages_init(struct kvm_mmu_page *parent,
- struct mmu_page_path *parents,
- struct kvm_mmu_pages *pvec)
-{
- parents->parent[parent->role.level-1] = NULL;
- pvec->nr = 0;
+ } while (!sp->unsync_children);
}
static void mmu_sync_children(struct kvm_vcpu *vcpu,
@@ -2021,7 +2034,6 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
struct kvm_mmu_pages pages;
LIST_HEAD(invalid_list);
- kvm_mmu_pages_init(parent, &parents, &pages);
while (mmu_unsync_walk(parent, &pages)) {
bool protected = false;
@@ -2037,7 +2049,6 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
}
kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
cond_resched_lock(&vcpu->kvm->mmu_lock);
- kvm_mmu_pages_init(parent, &parents, &pages);
}
}
@@ -2269,7 +2280,6 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
if (parent->role.level == PT_PAGE_TABLE_LEVEL)
return 0;
- kvm_mmu_pages_init(parent, &parents, &pages);
while (mmu_unsync_walk(parent, &pages)) {
struct kvm_mmu_page *sp;
@@ -2278,7 +2288,6 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
mmu_pages_clear_parents(&parents);
zapped++;
}
- kvm_mmu_pages_init(parent, &parents, &pages);
}
return zapped;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 02/12] KVM: MMU: check kvm_mmu_pages and mmu_page_path indices
2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
2016-02-24 13:17 ` [PATCH 01/12] KVM: MMU: Fix ubsan warnings Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
2016-02-24 13:17 ` [PATCH 03/12] KVM: MMU: introduce kvm_mmu_flush_or_zap Paolo Bonzini
` (11 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti
From: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Give a special invalid index to the root of the walk, so that we
can check the consistency of kvm_mmu_pages and mmu_page_path.
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
[Extracted from a bigger patch proposed by Guangrong. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/mmu.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4dee855897cf..3060873d8cab 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1840,6 +1840,8 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
return nr_unsync_leaf;
}
+#define INVALID_INDEX (-1)
+
static int mmu_unsync_walk(struct kvm_mmu_page *sp,
struct kvm_mmu_pages *pvec)
{
@@ -1847,7 +1849,7 @@ static int mmu_unsync_walk(struct kvm_mmu_page *sp,
if (!sp->unsync_children)
return 0;
- mmu_pages_add(pvec, sp, 0);
+ mmu_pages_add(pvec, sp, INVALID_INDEX);
return __mmu_unsync_walk(sp, pvec);
}
@@ -1996,6 +1998,8 @@ static int mmu_pages_first(struct kvm_mmu_pages *pvec,
if (pvec->nr == 0)
return 0;
+ WARN_ON(pvec->page[0].idx != INVALID_INDEX);
+
sp = pvec->page[0].sp;
level = sp->role.level;
WARN_ON(level == PT_PAGE_TABLE_LEVEL);
@@ -2020,6 +2024,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
if (!sp)
return;
+ WARN_ON(idx == INVALID_INDEX);
clear_unsync_child_bit(sp, idx);
level++;
} while (!sp->unsync_children);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 03/12] KVM: MMU: introduce kvm_mmu_flush_or_zap
2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
2016-02-24 13:17 ` [PATCH 01/12] KVM: MMU: Fix ubsan warnings Paolo Bonzini
2016-02-24 13:17 ` [PATCH 02/12] KVM: MMU: check kvm_mmu_pages and mmu_page_path indices Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
2016-02-24 13:17 ` [PATCH 04/12] KVM: MMU: move TLB flush out of __kvm_sync_page Paolo Bonzini
` (10 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti
This is a generalization of mmu_pte_write_flush_tlb, that also
takes care of calling kvm_mmu_commit_zap_page. The next
patches will introduce more uses.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/mmu.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3060873d8cab..6bf74d8d4989 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4125,11 +4125,14 @@ static bool need_remote_flush(u64 old, u64 new)
return (old & ~new & PT64_PERM_MASK) != 0;
}
-static void mmu_pte_write_flush_tlb(struct kvm_vcpu *vcpu, bool zap_page,
- bool remote_flush, bool local_flush)
+static void kvm_mmu_flush_or_zap(struct kvm_vcpu *vcpu,
+ struct list_head *invalid_list,
+ bool remote_flush, bool local_flush)
{
- if (zap_page)
+ if (!list_empty(invalid_list)) {
+ kvm_mmu_commit_zap_page(vcpu->kvm, invalid_list);
return;
+ }
if (remote_flush)
kvm_flush_remote_tlbs(vcpu->kvm);
@@ -4256,7 +4259,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
LIST_HEAD(invalid_list);
u64 entry, gentry, *spte;
int npte;
- bool remote_flush, local_flush, zap_page;
+ bool remote_flush, local_flush;
union kvm_mmu_page_role mask = { };
mask.cr0_wp = 1;
@@ -4273,7 +4276,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
if (!ACCESS_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
return;
- zap_page = remote_flush = local_flush = false;
+ remote_flush = local_flush = false;
pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes);
@@ -4293,8 +4296,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn) {
if (detect_write_misaligned(sp, gpa, bytes) ||
detect_write_flooding(sp)) {
- zap_page |= !!kvm_mmu_prepare_zap_page(vcpu->kvm, sp,
- &invalid_list);
+ kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list);
++vcpu->kvm->stat.mmu_flooded;
continue;
}
@@ -4316,8 +4318,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
++spte;
}
}
- mmu_pte_write_flush_tlb(vcpu, zap_page, remote_flush, local_flush);
- kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
+ kvm_mmu_flush_or_zap(vcpu, &invalid_list, remote_flush, local_flush);
kvm_mmu_audit(vcpu, AUDIT_POST_PTE_WRITE);
spin_unlock(&vcpu->kvm->mmu_lock);
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 04/12] KVM: MMU: move TLB flush out of __kvm_sync_page
2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
` (2 preceding siblings ...)
2016-02-24 13:17 ` [PATCH 03/12] KVM: MMU: introduce kvm_mmu_flush_or_zap Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
2016-02-24 13:17 ` [PATCH 05/12] KVM: MMU: use kvm_sync_page in kvm_sync_pages Paolo Bonzini
` (9 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti
By doing this, kvm_sync_pages can use __kvm_sync_page instead of
reinventing it. Because of kvm_mmu_flush_or_zap, the code does not
end up being more complex than before, and more cleanups to kvm_sync_pages
will come in the next patches.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/mmu.c | 53 ++++++++++++++++++++++++-----------------------------
1 file changed, 24 insertions(+), 29 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6bf74d8d4989..c440a28822b7 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1902,10 +1902,24 @@ static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
return 1;
}
- kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
return 0;
}
+static void kvm_mmu_flush_or_zap(struct kvm_vcpu *vcpu,
+ struct list_head *invalid_list,
+ bool remote_flush, bool local_flush)
+{
+ if (!list_empty(invalid_list)) {
+ kvm_mmu_commit_zap_page(vcpu->kvm, invalid_list);
+ return;
+ }
+
+ if (remote_flush)
+ kvm_flush_remote_tlbs(vcpu->kvm);
+ else if (local_flush)
+ kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+}
+
static int kvm_sync_page_transient(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp)
{
@@ -1913,8 +1927,7 @@ static int kvm_sync_page_transient(struct kvm_vcpu *vcpu,
int ret;
ret = __kvm_sync_page(vcpu, sp, &invalid_list, false);
- if (ret)
- kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
+ kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, !ret);
return ret;
}
@@ -1945,17 +1958,11 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
kvm_unlink_unsync_page(vcpu->kvm, s);
- if ((s->role.cr4_pae != !!is_pae(vcpu)) ||
- (vcpu->arch.mmu.sync_page(vcpu, s))) {
- kvm_mmu_prepare_zap_page(vcpu->kvm, s, &invalid_list);
- continue;
- }
- flush = true;
+ if (!__kvm_sync_page(vcpu, s, &invalid_list, false))
+ flush = true;
}
- kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
- if (flush)
- kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+ kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
}
struct mmu_page_path {
@@ -2041,6 +2048,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
while (mmu_unsync_walk(parent, &pages)) {
bool protected = false;
+ bool flush = false;
for_each_sp(pages, sp, parents, i)
protected |= rmap_write_protect(vcpu, sp->gfn);
@@ -2049,10 +2057,12 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
kvm_flush_remote_tlbs(vcpu->kvm);
for_each_sp(pages, sp, parents, i) {
- kvm_sync_page(vcpu, sp, &invalid_list);
+ if (!kvm_sync_page(vcpu, sp, &invalid_list))
+ flush = true;
+
mmu_pages_clear_parents(&parents);
}
- kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
+ kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
cond_resched_lock(&vcpu->kvm->mmu_lock);
}
}
@@ -4125,21 +4135,6 @@ static bool need_remote_flush(u64 old, u64 new)
return (old & ~new & PT64_PERM_MASK) != 0;
}
-static void kvm_mmu_flush_or_zap(struct kvm_vcpu *vcpu,
- struct list_head *invalid_list,
- bool remote_flush, bool local_flush)
-{
- if (!list_empty(invalid_list)) {
- kvm_mmu_commit_zap_page(vcpu->kvm, invalid_list);
- return;
- }
-
- if (remote_flush)
- kvm_flush_remote_tlbs(vcpu->kvm);
- else if (local_flush)
- kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
-}
-
static u64 mmu_pte_write_fetch_gpte(struct kvm_vcpu *vcpu, gpa_t *gpa,
const u8 *new, int *bytes)
{
--
1.8.3.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 05/12] KVM: MMU: use kvm_sync_page in kvm_sync_pages
2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
` (3 preceding siblings ...)
2016-02-24 13:17 ` [PATCH 04/12] KVM: MMU: move TLB flush out of __kvm_sync_page Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
2016-02-24 13:17 ` [PATCH 06/12] KVM: MMU: cleanup __kvm_sync_page and its callers Paolo Bonzini
` (8 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti
If the last argument is true, kvm_unlink_unsync_page is called anyway in
__kvm_sync_page (either by kvm_mmu_prepare_zap_page or by __kvm_sync_page
itself). Therefore, kvm_sync_pages can just call kvm_sync_page, instead
of going through kvm_unlink_unsync_page+__kvm_sync_page.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/mmu.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c440a28822b7..240c0dbb140f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1957,8 +1957,7 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
continue;
WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
- kvm_unlink_unsync_page(vcpu->kvm, s);
- if (!__kvm_sync_page(vcpu, s, &invalid_list, false))
+ if (!kvm_sync_page(vcpu, s, &invalid_list))
flush = true;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 06/12] KVM: MMU: cleanup __kvm_sync_page and its callers
2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
` (4 preceding siblings ...)
2016-02-24 13:17 ` [PATCH 05/12] KVM: MMU: use kvm_sync_page in kvm_sync_pages Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
2016-02-24 13:17 ` [PATCH 07/12] KVM: MMU: invert return value of FNAME(sync_page) and *kvm_sync_page* Paolo Bonzini
` (7 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti
Calling kvm_unlink_unsync_page in the middle of __kvm_sync_page makes
things unnecessarily tricky. If kvm_mmu_prepare_zap_page is called,
it will call kvm_unlink_unsync_page too. So kvm_unlink_unsync_page can
be called just as well at the beginning or the end of __kvm_sync_page...
which means that we might do it in kvm_sync_page too and remove the
parameter.
kvm_sync_page ends up being the same code that kvm_sync_pages used
to have before the previous patch.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/mmu.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 240c0dbb140f..56fa1636c0cf 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1887,16 +1887,13 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
/* @sp->gfn should be write-protected at the call site */
static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
- struct list_head *invalid_list, bool clear_unsync)
+ struct list_head *invalid_list)
{
if (sp->role.cr4_pae != !!is_pae(vcpu)) {
kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
return 1;
}
- if (clear_unsync)
- kvm_unlink_unsync_page(vcpu->kvm, sp);
-
if (vcpu->arch.mmu.sync_page(vcpu, sp)) {
kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
return 1;
@@ -1926,7 +1923,7 @@ static int kvm_sync_page_transient(struct kvm_vcpu *vcpu,
LIST_HEAD(invalid_list);
int ret;
- ret = __kvm_sync_page(vcpu, sp, &invalid_list, false);
+ ret = __kvm_sync_page(vcpu, sp, &invalid_list);
kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, !ret);
return ret;
@@ -1942,7 +1939,8 @@ static void mmu_audit_disable(void) { }
static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
struct list_head *invalid_list)
{
- return __kvm_sync_page(vcpu, sp, invalid_list, true);
+ kvm_unlink_unsync_page(vcpu->kvm, sp);
+ return __kvm_sync_page(vcpu, sp, invalid_list);
}
/* @gfn should be write-protected at the call site */
--
1.8.3.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 07/12] KVM: MMU: invert return value of FNAME(sync_page) and *kvm_sync_page*
2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
` (5 preceding siblings ...)
2016-02-24 13:17 ` [PATCH 06/12] KVM: MMU: cleanup __kvm_sync_page and its callers Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
2016-02-24 13:17 ` [PATCH 08/12] KVM: MMU: move zap/flush to kvm_mmu_get_page Paolo Bonzini
` (6 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti
Return true if the page was synced (and the TLB must be flushed)
and false if the page was zapped.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/mmu.c | 29 +++++++++++++----------------
arch/x86/kvm/paging_tmpl.h | 4 ++--
2 files changed, 15 insertions(+), 18 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 56fa1636c0cf..e3215cc89d97 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1886,20 +1886,20 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
if ((_sp)->role.direct || (_sp)->role.invalid) {} else
/* @sp->gfn should be write-protected at the call site */
-static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
- struct list_head *invalid_list)
+static bool __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+ struct list_head *invalid_list)
{
if (sp->role.cr4_pae != !!is_pae(vcpu)) {
kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
- return 1;
+ return false;
}
- if (vcpu->arch.mmu.sync_page(vcpu, sp)) {
+ if (vcpu->arch.mmu.sync_page(vcpu, sp) == 0) {
kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
- return 1;
+ return false;
}
- return 0;
+ return true;
}
static void kvm_mmu_flush_or_zap(struct kvm_vcpu *vcpu,
@@ -1917,14 +1917,14 @@ static void kvm_mmu_flush_or_zap(struct kvm_vcpu *vcpu,
kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
}
-static int kvm_sync_page_transient(struct kvm_vcpu *vcpu,
- struct kvm_mmu_page *sp)
+static bool kvm_sync_page_transient(struct kvm_vcpu *vcpu,
+ struct kvm_mmu_page *sp)
{
LIST_HEAD(invalid_list);
int ret;
ret = __kvm_sync_page(vcpu, sp, &invalid_list);
- kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, !ret);
+ kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, ret);
return ret;
}
@@ -1936,7 +1936,7 @@ static void kvm_mmu_audit(struct kvm_vcpu *vcpu, int point) { }
static void mmu_audit_disable(void) { }
#endif
-static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+static bool kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
struct list_head *invalid_list)
{
kvm_unlink_unsync_page(vcpu->kvm, sp);
@@ -1955,8 +1955,7 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
continue;
WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
- if (!kvm_sync_page(vcpu, s, &invalid_list))
- flush = true;
+ flush |= kvm_sync_page(vcpu, s, &invalid_list);
}
kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
@@ -2054,9 +2053,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
kvm_flush_remote_tlbs(vcpu->kvm);
for_each_sp(pages, sp, parents, i) {
- if (!kvm_sync_page(vcpu, sp, &invalid_list))
- flush = true;
-
+ flush |= kvm_sync_page(vcpu, sp, &invalid_list);
mmu_pages_clear_parents(&parents);
}
kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
@@ -2115,7 +2112,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
if (sp->role.word != role.word)
continue;
- if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
+ if (sp->unsync && !kvm_sync_page_transient(vcpu, sp))
break;
if (sp->unsync_children)
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 05827ff7bd2e..2e019b103249 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -938,7 +938,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
if (kvm_vcpu_read_guest_atomic(vcpu, pte_gpa, &gpte,
sizeof(pt_element_t)))
- return -EINVAL;
+ return 0;
if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) {
vcpu->kvm->tlbs_dirty++;
@@ -970,7 +970,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
host_writable);
}
- return !nr_present;
+ return nr_present;
}
#undef pt_element_t
--
1.8.3.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 08/12] KVM: MMU: move zap/flush to kvm_mmu_get_page
2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
` (6 preceding siblings ...)
2016-02-24 13:17 ` [PATCH 07/12] KVM: MMU: invert return value of FNAME(sync_page) and *kvm_sync_page* Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
2016-02-25 7:32 ` Xiao Guangrong
2016-02-24 13:17 ` [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children Paolo Bonzini
` (5 subsequent siblings)
13 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti
kvm_mmu_get_page is the only caller of kvm_sync_page_transient
and kvm_sync_pages. Moving the handling of the invalid_list there
removes the need for the underdocumented kvm_sync_page_transient
function.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
Guangrong, at this point I am confused about why
kvm_sync_page_transient didn't clear sp->unsync. Do
you remember? Or perhaps kvm_mmu_get_page could just
call kvm_sync_page now?
Also, can you explain the need_sync variable in
kvm_mmu_get_page?
arch/x86/kvm/mmu.c | 43 ++++++++++++++++++++++---------------------
1 file changed, 22 insertions(+), 21 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e3215cc89d97..725316df32ec 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1917,18 +1917,6 @@ static void kvm_mmu_flush_or_zap(struct kvm_vcpu *vcpu,
kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
}
-static bool kvm_sync_page_transient(struct kvm_vcpu *vcpu,
- struct kvm_mmu_page *sp)
-{
- LIST_HEAD(invalid_list);
- int ret;
-
- ret = __kvm_sync_page(vcpu, sp, &invalid_list);
- kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, ret);
-
- return ret;
-}
-
#ifdef CONFIG_KVM_MMU_AUDIT
#include "mmu_audit.c"
#else
@@ -1944,21 +1932,21 @@ static bool kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
}
/* @gfn should be write-protected at the call site */
-static void kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
+static bool kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn,
+ struct list_head *invalid_list)
{
struct kvm_mmu_page *s;
- LIST_HEAD(invalid_list);
- bool flush = false;
+ bool ret = false;
for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) {
if (!s->unsync)
continue;
WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL);
- flush |= kvm_sync_page(vcpu, s, &invalid_list);
+ ret |= kvm_sync_page(vcpu, s, invalid_list);
}
- kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
+ return ret;
}
struct mmu_page_path {
@@ -2089,6 +2077,8 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
unsigned quadrant;
struct kvm_mmu_page *sp;
bool need_sync = false;
+ bool flush = false;
+ LIST_HEAD(invalid_list);
role = vcpu->arch.mmu.base_role;
role.level = level;
@@ -2112,8 +2103,16 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
if (sp->role.word != role.word)
continue;
- if (sp->unsync && !kvm_sync_page_transient(vcpu, sp))
- break;
+ if (sp->unsync) {
+ /* The page is good, but __kvm_sync_page might still end
+ * up zapping it. If so, break in order to rebuild it.
+ */
+ if (!__kvm_sync_page(vcpu, sp, &invalid_list))
+ break;
+
+ WARN_ON(!list_empty(&invalid_list));
+ kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+ }
if (sp->unsync_children)
kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
@@ -2133,13 +2132,15 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
if (rmap_write_protect(vcpu, gfn))
kvm_flush_remote_tlbs(vcpu->kvm);
if (level > PT_PAGE_TABLE_LEVEL && need_sync)
- kvm_sync_pages(vcpu, gfn);
+ flush |= kvm_sync_pages(vcpu, gfn, &invalid_list);
account_shadowed(vcpu->kvm, sp);
}
sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
clear_page(sp->spt);
trace_kvm_mmu_get_page(sp, true);
+
+ kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
return sp;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children
2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
` (7 preceding siblings ...)
2016-02-24 13:17 ` [PATCH 08/12] KVM: MMU: move zap/flush to kvm_mmu_get_page Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
2016-02-25 2:15 ` Takuya Yoshikawa
2016-02-24 13:17 ` [PATCH 10/12] KVM: mark memory barrier with smp_mb__after_atomic Paolo Bonzini
` (4 subsequent siblings)
13 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti
Move the call to kvm_mmu_flush_or_zap outside the loop.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/mmu.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 725316df32ec..6d47b5c43246 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2029,24 +2029,27 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
struct mmu_page_path parents;
struct kvm_mmu_pages pages;
LIST_HEAD(invalid_list);
+ bool flush = false;
while (mmu_unsync_walk(parent, &pages)) {
bool protected = false;
- bool flush = false;
for_each_sp(pages, sp, parents, i)
protected |= rmap_write_protect(vcpu, sp->gfn);
- if (protected)
+ if (protected) {
kvm_flush_remote_tlbs(vcpu->kvm);
+ flush = false;
+ }
for_each_sp(pages, sp, parents, i) {
flush |= kvm_sync_page(vcpu, sp, &invalid_list);
mmu_pages_clear_parents(&parents);
}
- kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
cond_resched_lock(&vcpu->kvm->mmu_lock);
}
+
+ kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
}
static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 10/12] KVM: mark memory barrier with smp_mb__after_atomic
2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
` (8 preceding siblings ...)
2016-02-24 13:17 ` [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
2016-02-24 13:17 ` [PATCH 11/12] KVM: MMU: simplify last_pte_bitmap Paolo Bonzini
` (3 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
virt/kvm/kvm_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 39c36d4f4f5c..77a73739b08e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -171,7 +171,7 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
cpu = vcpu->cpu;
/* Set ->requests bit before we read ->mode */
- smp_mb();
+ smp_mb__after_atomic();
if (cpus != NULL && cpu != -1 && cpu != me &&
kvm_vcpu_exiting_guest_mode(vcpu) != OUTSIDE_GUEST_MODE)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 11/12] KVM: MMU: simplify last_pte_bitmap
2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
` (9 preceding siblings ...)
2016-02-24 13:17 ` [PATCH 10/12] KVM: mark memory barrier with smp_mb__after_atomic Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
2016-02-24 13:17 ` [PATCH 12/12] KVM: MMU: micro-optimize gpte_access Paolo Bonzini
` (2 subsequent siblings)
13 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti
Branch-free code is fun and everybody knows how much Avi loves it,
but last_pte_bitmap takes it a bit to the extreme. A logical | is
still branch-free and simpler than building the whole truth table into
last_pte_bitmap.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/include/asm/kvm_host.h | 6 +-----
arch/x86/kvm/mmu.c | 40 +++++++++++++++++++++++++---------------
2 files changed, 26 insertions(+), 20 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7b5459982433..3b4c0b98f285 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -338,11 +338,7 @@ struct kvm_mmu {
struct rsvd_bits_validate guest_rsvd_check;
- /*
- * Bitmap: bit set = last pte in walk
- * index[0:1]: level (zero-based)
- * index[2]: pte.ps
- */
+ /* bit [7-n] = can have large pages at (one-based) level n */
u8 last_pte_bitmap;
bool nx;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6d47b5c43246..6bf8c36b681b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3566,13 +3566,21 @@ static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
return false;
}
-static inline bool is_last_gpte(struct kvm_mmu *mmu, unsigned level, unsigned gpte)
+static inline bool is_last_gpte(struct kvm_mmu *mmu,
+ unsigned level, unsigned gpte)
{
- unsigned index;
+ unsigned m1, m2;
+
+ /* PT_PAGE_TABLE_LEVEL always terminates. m1 has bit 7 set iff
+ * level <= PT_PAGE_TABLE_LEVEL, which for our purpose means
+ * level == PT_PAGE_TABLE_LEVEL (level is never zero). */
+ m1 = level - (PT_PAGE_TABLE_LEVEL + 1);
- index = level - 1;
- index |= (gpte & PT_PAGE_SIZE_MASK) >> (PT_PAGE_SIZE_SHIFT - 2);
- return mmu->last_pte_bitmap & (1 << index);
+ /* last_pte_bitmap is built so that shifting it left by level
+ * produces a 1-bit mask for gpte's PT_PAGE_SIZE_MASK bit.
+ */
+ m2 = gpte & (mmu->last_pte_bitmap << level);
+ return (m1 | m2) & PT_PAGE_SIZE_MASK;
}
#define PTTYPE_EPT 18 /* arbitrary */
@@ -3848,16 +3856,18 @@ static void update_last_pte_bitmap(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
{
u8 map;
unsigned level, root_level = mmu->root_level;
- const unsigned ps_set_index = 1 << 2; /* bit 2 of index: ps */
-
- if (root_level == PT32E_ROOT_LEVEL)
- --root_level;
- /* PT_PAGE_TABLE_LEVEL always terminates */
- map = 1 | (1 << ps_set_index);
- for (level = PT_DIRECTORY_LEVEL; level <= root_level; ++level) {
- if (level <= PT_PDPE_LEVEL
- && (mmu->root_level >= PT32E_ROOT_LEVEL || is_pse(vcpu)))
- map |= 1 << (ps_set_index | (level - 1));
+
+ map = 0;
+ if (root_level >= PT32E_ROOT_LEVEL || is_pse(vcpu)) {
+ /* Apart from PSE, there are never large pages at the top level
+ * (512GB for 64-bit, 1GB for PAE).
+ */
+ if (root_level != PT32_ROOT_LEVEL)
+ --root_level;
+
+ BUILD_BUG_ON(PT_PAGE_SIZE_MASK > 255);
+ for (level = PT_DIRECTORY_LEVEL; level <= root_level; ++level)
+ map |= PT_PAGE_SIZE_MASK >> level;
}
mmu->last_pte_bitmap = map;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 12/12] KVM: MMU: micro-optimize gpte_access
2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
` (10 preceding siblings ...)
2016-02-24 13:17 ` [PATCH 11/12] KVM: MMU: simplify last_pte_bitmap Paolo Bonzini
@ 2016-02-24 13:17 ` Paolo Bonzini
2016-02-25 8:28 ` [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Xiao Guangrong
2016-03-04 21:43 ` Paolo Bonzini
13 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:17 UTC (permalink / raw)
To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti
Avoid AND-NOT, most x86 processor lack an instruction for it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/paging_tmpl.h | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 2e019b103249..3c77bdd443d9 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -189,8 +189,11 @@ static inline unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, u64 gpte)
((gpte & VMX_EPT_EXECUTABLE_MASK) ? ACC_EXEC_MASK : 0) |
ACC_USER_MASK;
#else
- access = (gpte & (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK;
- access &= ~(gpte >> PT64_NX_SHIFT);
+ BUILD_BUG_ON(ACC_EXEC_MASK != PT_PRESENT_MASK);
+ BUILD_BUG_ON(ACC_EXEC_MASK != 1);
+ access = gpte & (PT_WRITABLE_MASK | PT_USER_MASK | PT_PRESENT_MASK);
+ /* Combine NX with P (which is set here) to get ACC_EXEC_MASK. */
+ access ^= (gpte >> PT64_NX_SHIFT);
#endif
return access;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH 01/12] KVM: MMU: Fix ubsan warnings
2016-02-24 13:17 ` [PATCH 01/12] KVM: MMU: Fix ubsan warnings Paolo Bonzini
@ 2016-02-24 13:42 ` Mike Krinkin
2016-02-24 13:43 ` Paolo Bonzini
0 siblings, 1 reply; 26+ messages in thread
From: Mike Krinkin @ 2016-02-24 13:42 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel, kvm, yoshikawa_takuya_b1, guangrong.xiao, mtosatti
Could you also merge a fix for the following ubsan warning (i
reported this one too, but it attracted a bit less attention):
[ 168.791851] ================================================================================
[ 168.791862] UBSAN: Undefined behaviour in arch/x86/kvm/paging_tmpl.h:252:15
[ 168.791866] index 4 is out of range for type 'u64 [4]'
[ 168.791871] CPU: 0 PID: 2950 Comm: qemu-system-x86 Tainted: G O L 4.5.0-rc5-next-20160222 #7
[ 168.791873] Hardware name: LENOVO 23205NG/23205NG, BIOS G2ET95WW (2.55 ) 07/09/2013
[ 168.791876] 0000000000000000 ffff8801cfcaf208 ffffffff81c9f780 0000000041b58ab3
[ 168.791882] ffffffff82eb2cc1 ffffffff81c9f6b4 ffff8801cfcaf230 ffff8801cfcaf1e0
[ 168.791886] 0000000000000004 0000000000000001 0000000000000000 ffffffffa1981600
[ 168.791891] Call Trace:
[ 168.791899] [<ffffffff81c9f780>] dump_stack+0xcc/0x12c
[ 168.791904] [<ffffffff81c9f6b4>] ? _atomic_dec_and_lock+0xc4/0xc4
[ 168.791910] [<ffffffff81da9e81>] ubsan_epilogue+0xd/0x8a
[ 168.791914] [<ffffffff81daafa2>] __ubsan_handle_out_of_bounds+0x15c/0x1a3
[ 168.791918] [<ffffffff81daae46>] ? __ubsan_handle_shift_out_of_bounds+0x2bd/0x2bd
[ 168.791922] [<ffffffff811287ef>] ? get_user_pages_fast+0x2bf/0x360
[ 168.791954] [<ffffffffa1794050>] ? kvm_largepages_enabled+0x30/0x30 [kvm]
[ 168.791958] [<ffffffff81128530>] ? __get_user_pages_fast+0x360/0x360
[ 168.791987] [<ffffffffa181b818>] paging64_walk_addr_generic+0x1b28/0x2600 [kvm]
[ 168.792014] [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
[ 168.792019] [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
[ 168.792044] [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
[ 168.792076] [<ffffffffa181c36d>] paging64_gva_to_gpa+0x7d/0x110 [kvm]
[ 168.792121] [<ffffffffa181c2f0>] ? paging64_walk_addr_generic+0x2600/0x2600 [kvm]
[ 168.792130] [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
[ 168.792178] [<ffffffffa17d9a4a>] emulator_read_write_onepage+0x27a/0x1150 [kvm]
[ 168.792208] [<ffffffffa1794d44>] ? __kvm_read_guest_page+0x54/0x70 [kvm]
[ 168.792234] [<ffffffffa17d97d0>] ? kvm_task_switch+0x160/0x160 [kvm]
[ 168.792238] [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
[ 168.792263] [<ffffffffa17daa07>] emulator_read_write+0xe7/0x6d0 [kvm]
[ 168.792290] [<ffffffffa183b620>] ? em_cr_write+0x230/0x230 [kvm]
[ 168.792314] [<ffffffffa17db005>] emulator_write_emulated+0x15/0x20 [kvm]
[ 168.792340] [<ffffffffa18465f8>] segmented_write+0xf8/0x130 [kvm]
[ 168.792367] [<ffffffffa1846500>] ? em_lgdt+0x20/0x20 [kvm]
[ 168.792374] [<ffffffffa14db512>] ? vmx_read_guest_seg_ar+0x42/0x1e0 [kvm_intel]
[ 168.792400] [<ffffffffa1846d82>] writeback+0x3f2/0x700 [kvm]
[ 168.792424] [<ffffffffa1846990>] ? em_sidt+0xa0/0xa0 [kvm]
[ 168.792449] [<ffffffffa185554d>] ? x86_decode_insn+0x1b3d/0x4f70 [kvm]
[ 168.792474] [<ffffffffa1859032>] x86_emulate_insn+0x572/0x3010 [kvm]
[ 168.792499] [<ffffffffa17e71dd>] x86_emulate_instruction+0x3bd/0x2110 [kvm]
[ 168.792524] [<ffffffffa17e6e20>] ? reexecute_instruction.part.110+0x2e0/0x2e0 [kvm]
[ 168.792532] [<ffffffffa14e9a81>] handle_ept_misconfig+0x61/0x460 [kvm_intel]
[ 168.792539] [<ffffffffa14e9a20>] ? handle_pause+0x450/0x450 [kvm_intel]
[ 168.792546] [<ffffffffa15130ea>] vmx_handle_exit+0xd6a/0x1ad0 [kvm_intel]
[ 168.792572] [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
[ 168.792597] [<ffffffffa17f6bcd>] kvm_arch_vcpu_ioctl_run+0xd3d/0x6090 [kvm]
[ 168.792621] [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
[ 168.792627] [<ffffffff8293b530>] ? __ww_mutex_lock_interruptible+0x1630/0x1630
[ 168.792651] [<ffffffffa17f5e90>] ? kvm_arch_vcpu_runnable+0x4f0/0x4f0 [kvm]
[ 168.792656] [<ffffffff811eeb30>] ? preempt_notifier_unregister+0x190/0x190
[ 168.792681] [<ffffffffa17e0447>] ? kvm_arch_vcpu_load+0x127/0x650 [kvm]
[ 168.792704] [<ffffffffa178e9a3>] kvm_vcpu_ioctl+0x553/0xda0 [kvm]
[ 168.792727] [<ffffffffa178e450>] ? vcpu_put+0x40/0x40 [kvm]
[ 168.792732] [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
[ 168.792735] [<ffffffff82946087>] ? _raw_spin_unlock+0x27/0x40
[ 168.792740] [<ffffffff8163a943>] ? handle_mm_fault+0x1673/0x2e40
[ 168.792744] [<ffffffff8129daa8>] ? trace_hardirqs_on_caller+0x478/0x6c0
[ 168.792747] [<ffffffff8129dcfd>] ? trace_hardirqs_on+0xd/0x10
[ 168.792751] [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
[ 168.792756] [<ffffffff81725a80>] do_vfs_ioctl+0x1b0/0x12b0
[ 168.792759] [<ffffffff817258d0>] ? ioctl_preallocate+0x210/0x210
[ 168.792763] [<ffffffff8174aef3>] ? __fget+0x273/0x4a0
[ 168.792766] [<ffffffff8174acd0>] ? __fget+0x50/0x4a0
[ 168.792770] [<ffffffff8174b1f6>] ? __fget_light+0x96/0x2b0
[ 168.792773] [<ffffffff81726bf9>] SyS_ioctl+0x79/0x90
[ 168.792777] [<ffffffff82946880>] entry_SYSCALL_64_fastpath+0x23/0xc1
[ 168.792780] ================================================================================
This seems to be a typo, and the following fix solves problem for me:
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 6c9fed9..2ce4f05 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -249,7 +249,7 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
return ret;
kvm_vcpu_mark_page_dirty(vcpu, table_gfn);
- walker->ptes[level] = pte;
+ walker->ptes[level - 1] = pte;
}
return 0;
}
On Wed, Feb 24, 2016 at 02:17:42PM +0100, Paolo Bonzini wrote:
> kvm_mmu_pages_init is doing some really yucky stuff. It is setting
> up a sentinel for mmu_page_clear_parents; however, because of a) the
> way levels are numbered starting from 1 and b) the way mmu_page_path
> sizes its arrays with PT64_ROOT_LEVEL-1 elements, the access can be
> out of bounds. This is harmless because the code overwrites up to the
> first two elements of parents->idx and these are initialized, and
> because the sentinel is not needed in this case---mmu_page_clear_parents
> exits anyway when it gets to the end of the array. However ubsan
> complains, and everyone else should too.
>
> This fix does three things. First it makes the mmu_page_path arrays
> PT64_ROOT_LEVEL elements in size, so that we can write to them without
> checking the level in advance. Second it disintegrates kvm_mmu_pages_init
> between mmu_unsync_walk (to reset the struct kvm_mmu_pages) and
> for_each_sp (to place the NULL sentinel at the end of the current path).
> This is okay because the mmu_page_path is only used in
> mmu_pages_clear_parents; mmu_pages_clear_parents itself is called within
> a for_each_sp iterator, and hence always after a call to mmu_pages_next.
> Third it changes mmu_pages_clear_parents to just use the sentinel to
> stop iteration, without checking the bounds on level.
>
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Reported-by: Mike Krinkin <krinkin.m.u@gmail.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/kvm/mmu.c | 57 +++++++++++++++++++++++++++++++-----------------------
> 1 file changed, 33 insertions(+), 24 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 07f4c26a10d3..4dee855897cf 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1843,6 +1843,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
> static int mmu_unsync_walk(struct kvm_mmu_page *sp,
> struct kvm_mmu_pages *pvec)
> {
> + pvec->nr = 0;
> if (!sp->unsync_children)
> return 0;
>
> @@ -1956,13 +1957,12 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
> }
>
> struct mmu_page_path {
> - struct kvm_mmu_page *parent[PT64_ROOT_LEVEL-1];
> - unsigned int idx[PT64_ROOT_LEVEL-1];
> + struct kvm_mmu_page *parent[PT64_ROOT_LEVEL];
> + unsigned int idx[PT64_ROOT_LEVEL];
> };
>
> #define for_each_sp(pvec, sp, parents, i) \
> - for (i = mmu_pages_next(&pvec, &parents, -1), \
> - sp = pvec.page[i].sp; \
> + for (i = mmu_pages_first(&pvec, &parents); \
> i < pvec.nr && ({ sp = pvec.page[i].sp; 1;}); \
> i = mmu_pages_next(&pvec, &parents, i))
>
> @@ -1974,19 +1974,41 @@ static int mmu_pages_next(struct kvm_mmu_pages *pvec,
>
> for (n = i+1; n < pvec->nr; n++) {
> struct kvm_mmu_page *sp = pvec->page[n].sp;
> + unsigned idx = pvec->page[n].idx;
> + int level = sp->role.level;
>
> - if (sp->role.level == PT_PAGE_TABLE_LEVEL) {
> - parents->idx[0] = pvec->page[n].idx;
> - return n;
> - }
> + parents->idx[level-1] = idx;
> + if (level == PT_PAGE_TABLE_LEVEL)
> + break;
>
> - parents->parent[sp->role.level-2] = sp;
> - parents->idx[sp->role.level-1] = pvec->page[n].idx;
> + parents->parent[level-2] = sp;
> }
>
> return n;
> }
>
> +static int mmu_pages_first(struct kvm_mmu_pages *pvec,
> + struct mmu_page_path *parents)
> +{
> + struct kvm_mmu_page *sp;
> + int level;
> +
> + if (pvec->nr == 0)
> + return 0;
> +
> + sp = pvec->page[0].sp;
> + level = sp->role.level;
> + WARN_ON(level == PT_PAGE_TABLE_LEVEL);
> +
> + parents->parent[level-2] = sp;
> +
> + /* Also set up a sentinel. Further entries in pvec are all
> + * children of sp, so this element is never overwritten.
> + */
> + parents->parent[level-1] = NULL;
> + return mmu_pages_next(pvec, parents, 0);
> +}
> +
> static void mmu_pages_clear_parents(struct mmu_page_path *parents)
> {
> struct kvm_mmu_page *sp;
> @@ -1994,22 +2016,13 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
>
> do {
> unsigned int idx = parents->idx[level];
> -
> sp = parents->parent[level];
> if (!sp)
> return;
>
> clear_unsync_child_bit(sp, idx);
> level++;
> - } while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children);
> -}
> -
> -static void kvm_mmu_pages_init(struct kvm_mmu_page *parent,
> - struct mmu_page_path *parents,
> - struct kvm_mmu_pages *pvec)
> -{
> - parents->parent[parent->role.level-1] = NULL;
> - pvec->nr = 0;
> + } while (!sp->unsync_children);
> }
>
> static void mmu_sync_children(struct kvm_vcpu *vcpu,
> @@ -2021,7 +2034,6 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
> struct kvm_mmu_pages pages;
> LIST_HEAD(invalid_list);
>
> - kvm_mmu_pages_init(parent, &parents, &pages);
> while (mmu_unsync_walk(parent, &pages)) {
> bool protected = false;
>
> @@ -2037,7 +2049,6 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
> }
> kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
> cond_resched_lock(&vcpu->kvm->mmu_lock);
> - kvm_mmu_pages_init(parent, &parents, &pages);
> }
> }
>
> @@ -2269,7 +2280,6 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
> if (parent->role.level == PT_PAGE_TABLE_LEVEL)
> return 0;
>
> - kvm_mmu_pages_init(parent, &parents, &pages);
> while (mmu_unsync_walk(parent, &pages)) {
> struct kvm_mmu_page *sp;
>
> @@ -2278,7 +2288,6 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
> mmu_pages_clear_parents(&parents);
> zapped++;
> }
> - kvm_mmu_pages_init(parent, &parents, &pages);
> }
>
> return zapped;
> --
> 1.8.3.1
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH 01/12] KVM: MMU: Fix ubsan warnings
2016-02-24 13:42 ` Mike Krinkin
@ 2016-02-24 13:43 ` Paolo Bonzini
0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-24 13:43 UTC (permalink / raw)
To: Mike Krinkin
Cc: linux-kernel, kvm, yoshikawa_takuya_b1, guangrong.xiao, mtosatti
On 24/02/2016 14:42, Mike Krinkin wrote:
> Could you also merge a fix for the following ubsan warning (i
> reported this one too, but it attracted a bit less attention):
>
> [ 168.791851] ================================================================================
> [ 168.791862] UBSAN: Undefined behaviour in arch/x86/kvm/paging_tmpl.h:252:15
> [ 168.791866] index 4 is out of range for type 'u64 [4]'
> [ 168.791871] CPU: 0 PID: 2950 Comm: qemu-system-x86 Tainted: G O L 4.5.0-rc5-next-20160222 #7
> [ 168.791873] Hardware name: LENOVO 23205NG/23205NG, BIOS G2ET95WW (2.55 ) 07/09/2013
> [ 168.791876] 0000000000000000 ffff8801cfcaf208 ffffffff81c9f780 0000000041b58ab3
> [ 168.791882] ffffffff82eb2cc1 ffffffff81c9f6b4 ffff8801cfcaf230 ffff8801cfcaf1e0
> [ 168.791886] 0000000000000004 0000000000000001 0000000000000000 ffffffffa1981600
> [ 168.791891] Call Trace:
> [ 168.791899] [<ffffffff81c9f780>] dump_stack+0xcc/0x12c
> [ 168.791904] [<ffffffff81c9f6b4>] ? _atomic_dec_and_lock+0xc4/0xc4
> [ 168.791910] [<ffffffff81da9e81>] ubsan_epilogue+0xd/0x8a
> [ 168.791914] [<ffffffff81daafa2>] __ubsan_handle_out_of_bounds+0x15c/0x1a3
> [ 168.791918] [<ffffffff81daae46>] ? __ubsan_handle_shift_out_of_bounds+0x2bd/0x2bd
> [ 168.791922] [<ffffffff811287ef>] ? get_user_pages_fast+0x2bf/0x360
> [ 168.791954] [<ffffffffa1794050>] ? kvm_largepages_enabled+0x30/0x30 [kvm]
> [ 168.791958] [<ffffffff81128530>] ? __get_user_pages_fast+0x360/0x360
> [ 168.791987] [<ffffffffa181b818>] paging64_walk_addr_generic+0x1b28/0x2600 [kvm]
> [ 168.792014] [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
> [ 168.792019] [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
> [ 168.792044] [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
> [ 168.792076] [<ffffffffa181c36d>] paging64_gva_to_gpa+0x7d/0x110 [kvm]
> [ 168.792121] [<ffffffffa181c2f0>] ? paging64_walk_addr_generic+0x2600/0x2600 [kvm]
> [ 168.792130] [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
> [ 168.792178] [<ffffffffa17d9a4a>] emulator_read_write_onepage+0x27a/0x1150 [kvm]
> [ 168.792208] [<ffffffffa1794d44>] ? __kvm_read_guest_page+0x54/0x70 [kvm]
> [ 168.792234] [<ffffffffa17d97d0>] ? kvm_task_switch+0x160/0x160 [kvm]
> [ 168.792238] [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
> [ 168.792263] [<ffffffffa17daa07>] emulator_read_write+0xe7/0x6d0 [kvm]
> [ 168.792290] [<ffffffffa183b620>] ? em_cr_write+0x230/0x230 [kvm]
> [ 168.792314] [<ffffffffa17db005>] emulator_write_emulated+0x15/0x20 [kvm]
> [ 168.792340] [<ffffffffa18465f8>] segmented_write+0xf8/0x130 [kvm]
> [ 168.792367] [<ffffffffa1846500>] ? em_lgdt+0x20/0x20 [kvm]
> [ 168.792374] [<ffffffffa14db512>] ? vmx_read_guest_seg_ar+0x42/0x1e0 [kvm_intel]
> [ 168.792400] [<ffffffffa1846d82>] writeback+0x3f2/0x700 [kvm]
> [ 168.792424] [<ffffffffa1846990>] ? em_sidt+0xa0/0xa0 [kvm]
> [ 168.792449] [<ffffffffa185554d>] ? x86_decode_insn+0x1b3d/0x4f70 [kvm]
> [ 168.792474] [<ffffffffa1859032>] x86_emulate_insn+0x572/0x3010 [kvm]
> [ 168.792499] [<ffffffffa17e71dd>] x86_emulate_instruction+0x3bd/0x2110 [kvm]
> [ 168.792524] [<ffffffffa17e6e20>] ? reexecute_instruction.part.110+0x2e0/0x2e0 [kvm]
> [ 168.792532] [<ffffffffa14e9a81>] handle_ept_misconfig+0x61/0x460 [kvm_intel]
> [ 168.792539] [<ffffffffa14e9a20>] ? handle_pause+0x450/0x450 [kvm_intel]
> [ 168.792546] [<ffffffffa15130ea>] vmx_handle_exit+0xd6a/0x1ad0 [kvm_intel]
> [ 168.792572] [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
> [ 168.792597] [<ffffffffa17f6bcd>] kvm_arch_vcpu_ioctl_run+0xd3d/0x6090 [kvm]
> [ 168.792621] [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
> [ 168.792627] [<ffffffff8293b530>] ? __ww_mutex_lock_interruptible+0x1630/0x1630
> [ 168.792651] [<ffffffffa17f5e90>] ? kvm_arch_vcpu_runnable+0x4f0/0x4f0 [kvm]
> [ 168.792656] [<ffffffff811eeb30>] ? preempt_notifier_unregister+0x190/0x190
> [ 168.792681] [<ffffffffa17e0447>] ? kvm_arch_vcpu_load+0x127/0x650 [kvm]
> [ 168.792704] [<ffffffffa178e9a3>] kvm_vcpu_ioctl+0x553/0xda0 [kvm]
> [ 168.792727] [<ffffffffa178e450>] ? vcpu_put+0x40/0x40 [kvm]
> [ 168.792732] [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
> [ 168.792735] [<ffffffff82946087>] ? _raw_spin_unlock+0x27/0x40
> [ 168.792740] [<ffffffff8163a943>] ? handle_mm_fault+0x1673/0x2e40
> [ 168.792744] [<ffffffff8129daa8>] ? trace_hardirqs_on_caller+0x478/0x6c0
> [ 168.792747] [<ffffffff8129dcfd>] ? trace_hardirqs_on+0xd/0x10
> [ 168.792751] [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
> [ 168.792756] [<ffffffff81725a80>] do_vfs_ioctl+0x1b0/0x12b0
> [ 168.792759] [<ffffffff817258d0>] ? ioctl_preallocate+0x210/0x210
> [ 168.792763] [<ffffffff8174aef3>] ? __fget+0x273/0x4a0
> [ 168.792766] [<ffffffff8174acd0>] ? __fget+0x50/0x4a0
> [ 168.792770] [<ffffffff8174b1f6>] ? __fget_light+0x96/0x2b0
> [ 168.792773] [<ffffffff81726bf9>] SyS_ioctl+0x79/0x90
> [ 168.792777] [<ffffffff82946880>] entry_SYSCALL_64_fastpath+0x23/0xc1
> [ 168.792780] ================================================================================
>
> This seems to be a typo, and the following fix solves problem for me:
>
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 6c9fed9..2ce4f05 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -249,7 +249,7 @@ static int FNAME(update_accessed_dirty_bits)(struct kvm_vcpu *vcpu,
> return ret;
>
> kvm_vcpu_mark_page_dirty(vcpu, table_gfn);
> - walker->ptes[level] = pte;
> + walker->ptes[level - 1] = pte;
> }
> return 0;
> }
This patch is correct---good catch! Can you resend it with a
"Signed-off-by: Mike Krinkin <krinkin.m.u@gmail.com>" line?
Thanks,
Paolo
> On Wed, Feb 24, 2016 at 02:17:42PM +0100, Paolo Bonzini wrote:
>> kvm_mmu_pages_init is doing some really yucky stuff. It is setting
>> up a sentinel for mmu_page_clear_parents; however, because of a) the
>> way levels are numbered starting from 1 and b) the way mmu_page_path
>> sizes its arrays with PT64_ROOT_LEVEL-1 elements, the access can be
>> out of bounds. This is harmless because the code overwrites up to the
>> first two elements of parents->idx and these are initialized, and
>> because the sentinel is not needed in this case---mmu_page_clear_parents
>> exits anyway when it gets to the end of the array. However ubsan
>> complains, and everyone else should too.
>>
>> This fix does three things. First it makes the mmu_page_path arrays
>> PT64_ROOT_LEVEL elements in size, so that we can write to them without
>> checking the level in advance. Second it disintegrates kvm_mmu_pages_init
>> between mmu_unsync_walk (to reset the struct kvm_mmu_pages) and
>> for_each_sp (to place the NULL sentinel at the end of the current path).
>> This is okay because the mmu_page_path is only used in
>> mmu_pages_clear_parents; mmu_pages_clear_parents itself is called within
>> a for_each_sp iterator, and hence always after a call to mmu_pages_next.
>> Third it changes mmu_pages_clear_parents to just use the sentinel to
>> stop iteration, without checking the bounds on level.
>>
>> Reported-by: Sasha Levin <sasha.levin@oracle.com>
>> Reported-by: Mike Krinkin <krinkin.m.u@gmail.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>> arch/x86/kvm/mmu.c | 57 +++++++++++++++++++++++++++++++-----------------------
>> 1 file changed, 33 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 07f4c26a10d3..4dee855897cf 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -1843,6 +1843,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
>> static int mmu_unsync_walk(struct kvm_mmu_page *sp,
>> struct kvm_mmu_pages *pvec)
>> {
>> + pvec->nr = 0;
>> if (!sp->unsync_children)
>> return 0;
>>
>> @@ -1956,13 +1957,12 @@ static void kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn)
>> }
>>
>> struct mmu_page_path {
>> - struct kvm_mmu_page *parent[PT64_ROOT_LEVEL-1];
>> - unsigned int idx[PT64_ROOT_LEVEL-1];
>> + struct kvm_mmu_page *parent[PT64_ROOT_LEVEL];
>> + unsigned int idx[PT64_ROOT_LEVEL];
>> };
>>
>> #define for_each_sp(pvec, sp, parents, i) \
>> - for (i = mmu_pages_next(&pvec, &parents, -1), \
>> - sp = pvec.page[i].sp; \
>> + for (i = mmu_pages_first(&pvec, &parents); \
>> i < pvec.nr && ({ sp = pvec.page[i].sp; 1;}); \
>> i = mmu_pages_next(&pvec, &parents, i))
>>
>> @@ -1974,19 +1974,41 @@ static int mmu_pages_next(struct kvm_mmu_pages *pvec,
>>
>> for (n = i+1; n < pvec->nr; n++) {
>> struct kvm_mmu_page *sp = pvec->page[n].sp;
>> + unsigned idx = pvec->page[n].idx;
>> + int level = sp->role.level;
>>
>> - if (sp->role.level == PT_PAGE_TABLE_LEVEL) {
>> - parents->idx[0] = pvec->page[n].idx;
>> - return n;
>> - }
>> + parents->idx[level-1] = idx;
>> + if (level == PT_PAGE_TABLE_LEVEL)
>> + break;
>>
>> - parents->parent[sp->role.level-2] = sp;
>> - parents->idx[sp->role.level-1] = pvec->page[n].idx;
>> + parents->parent[level-2] = sp;
>> }
>>
>> return n;
>> }
>>
>> +static int mmu_pages_first(struct kvm_mmu_pages *pvec,
>> + struct mmu_page_path *parents)
>> +{
>> + struct kvm_mmu_page *sp;
>> + int level;
>> +
>> + if (pvec->nr == 0)
>> + return 0;
>> +
>> + sp = pvec->page[0].sp;
>> + level = sp->role.level;
>> + WARN_ON(level == PT_PAGE_TABLE_LEVEL);
>> +
>> + parents->parent[level-2] = sp;
>> +
>> + /* Also set up a sentinel. Further entries in pvec are all
>> + * children of sp, so this element is never overwritten.
>> + */
>> + parents->parent[level-1] = NULL;
>> + return mmu_pages_next(pvec, parents, 0);
>> +}
>> +
>> static void mmu_pages_clear_parents(struct mmu_page_path *parents)
>> {
>> struct kvm_mmu_page *sp;
>> @@ -1994,22 +2016,13 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents)
>>
>> do {
>> unsigned int idx = parents->idx[level];
>> -
>> sp = parents->parent[level];
>> if (!sp)
>> return;
>>
>> clear_unsync_child_bit(sp, idx);
>> level++;
>> - } while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children);
>> -}
>> -
>> -static void kvm_mmu_pages_init(struct kvm_mmu_page *parent,
>> - struct mmu_page_path *parents,
>> - struct kvm_mmu_pages *pvec)
>> -{
>> - parents->parent[parent->role.level-1] = NULL;
>> - pvec->nr = 0;
>> + } while (!sp->unsync_children);
>> }
>>
>> static void mmu_sync_children(struct kvm_vcpu *vcpu,
>> @@ -2021,7 +2034,6 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
>> struct kvm_mmu_pages pages;
>> LIST_HEAD(invalid_list);
>>
>> - kvm_mmu_pages_init(parent, &parents, &pages);
>> while (mmu_unsync_walk(parent, &pages)) {
>> bool protected = false;
>>
>> @@ -2037,7 +2049,6 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
>> }
>> kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
>> cond_resched_lock(&vcpu->kvm->mmu_lock);
>> - kvm_mmu_pages_init(parent, &parents, &pages);
>> }
>> }
>>
>> @@ -2269,7 +2280,6 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
>> if (parent->role.level == PT_PAGE_TABLE_LEVEL)
>> return 0;
>>
>> - kvm_mmu_pages_init(parent, &parents, &pages);
>> while (mmu_unsync_walk(parent, &pages)) {
>> struct kvm_mmu_page *sp;
>>
>> @@ -2278,7 +2288,6 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
>> mmu_pages_clear_parents(&parents);
>> zapped++;
>> }
>> - kvm_mmu_pages_init(parent, &parents, &pages);
>> }
>>
>> return zapped;
>> --
>> 1.8.3.1
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children
2016-02-24 13:17 ` [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children Paolo Bonzini
@ 2016-02-25 2:15 ` Takuya Yoshikawa
2016-02-25 7:35 ` Xiao Guangrong
2016-02-25 8:46 ` Paolo Bonzini
0 siblings, 2 replies; 26+ messages in thread
From: Takuya Yoshikawa @ 2016-02-25 2:15 UTC (permalink / raw)
To: Paolo Bonzini, linux-kernel, kvm; +Cc: guangrong.xiao, mtosatti
On 2016/02/24 22:17, Paolo Bonzini wrote:
> Move the call to kvm_mmu_flush_or_zap outside the loop.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/kvm/mmu.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 725316df32ec..6d47b5c43246 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2029,24 +2029,27 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
> struct mmu_page_path parents;
> struct kvm_mmu_pages pages;
> LIST_HEAD(invalid_list);
> + bool flush = false;
>
> while (mmu_unsync_walk(parent, &pages)) {
> bool protected = false;
> - bool flush = false;
>
> for_each_sp(pages, sp, parents, i)
> protected |= rmap_write_protect(vcpu, sp->gfn);
>
> - if (protected)
> + if (protected) {
> kvm_flush_remote_tlbs(vcpu->kvm);
> + flush = false;
> + }
>
> for_each_sp(pages, sp, parents, i) {
> flush |= kvm_sync_page(vcpu, sp, &invalid_list);
> mmu_pages_clear_parents(&parents);
> }
> - kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
> cond_resched_lock(&vcpu->kvm->mmu_lock);
This may release the mmu_lock before committing the zapping.
Is it safe? If so, we may want to see the reason in the changelog.
Takuya
> }
> +
> + kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
> }
>
> static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 08/12] KVM: MMU: move zap/flush to kvm_mmu_get_page
2016-02-24 13:17 ` [PATCH 08/12] KVM: MMU: move zap/flush to kvm_mmu_get_page Paolo Bonzini
@ 2016-02-25 7:32 ` Xiao Guangrong
2016-02-25 8:48 ` Paolo Bonzini
0 siblings, 1 reply; 26+ messages in thread
From: Xiao Guangrong @ 2016-02-25 7:32 UTC (permalink / raw)
To: Paolo Bonzini, linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, mtosatti
On 02/24/2016 09:17 PM, Paolo Bonzini wrote:
> kvm_mmu_get_page is the only caller of kvm_sync_page_transient
> and kvm_sync_pages. Moving the handling of the invalid_list there
> removes the need for the underdocumented kvm_sync_page_transient
> function.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> Guangrong, at this point I am confused about why
> kvm_sync_page_transient didn't clear sp->unsync. Do
> you remember? Or perhaps kvm_mmu_get_page could just
> call kvm_sync_page now?
>
It is the optimization to reduce write-protect as changing unsync to
sync need to write-protect the page and sync all sptes pointing to the
same gfn.
However, after syncing the content between unsync-ed spte and guest pte,
we can reuse this spte perfectly.
> Also, can you explain the need_sync variable in
> kvm_mmu_get_page?
This is because we need to to protect the semanteme of 'unsync spte' as
only the spte on last level (level = 1) can be unsync so that if a spte
on the upper level is created we should eliminate all the unsync sptes
pointing to the same gfn.
As you have already merged this patchset to the kvm tree, i will post
a patch to comment these cases to make the code be more understandable.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children
2016-02-25 2:15 ` Takuya Yoshikawa
@ 2016-02-25 7:35 ` Xiao Guangrong
2016-02-25 8:49 ` Paolo Bonzini
2016-02-25 8:46 ` Paolo Bonzini
1 sibling, 1 reply; 26+ messages in thread
From: Xiao Guangrong @ 2016-02-25 7:35 UTC (permalink / raw)
To: Takuya Yoshikawa, Paolo Bonzini, linux-kernel, kvm; +Cc: mtosatti
On 02/25/2016 10:15 AM, Takuya Yoshikawa wrote:
> On 2016/02/24 22:17, Paolo Bonzini wrote:
>> Move the call to kvm_mmu_flush_or_zap outside the loop.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>> arch/x86/kvm/mmu.c | 9 ++++++---
>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 725316df32ec..6d47b5c43246 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2029,24 +2029,27 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
>> struct mmu_page_path parents;
>> struct kvm_mmu_pages pages;
>> LIST_HEAD(invalid_list);
>> + bool flush = false;
>>
>> while (mmu_unsync_walk(parent, &pages)) {
>> bool protected = false;
>> - bool flush = false;
>>
>> for_each_sp(pages, sp, parents, i)
>> protected |= rmap_write_protect(vcpu, sp->gfn);
>>
>> - if (protected)
>> + if (protected) {
>> kvm_flush_remote_tlbs(vcpu->kvm);
>> + flush = false;
>> + }
>>
>> for_each_sp(pages, sp, parents, i) {
>> flush |= kvm_sync_page(vcpu, sp, &invalid_list);
>> mmu_pages_clear_parents(&parents);
>> }
>> - kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
>> cond_resched_lock(&vcpu->kvm->mmu_lock);
>
> This may release the mmu_lock before committing the zapping.
> Is it safe? If so, we may want to see the reason in the changelog.
It is unsafe indeed, please do not do it.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations
2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
` (11 preceding siblings ...)
2016-02-24 13:17 ` [PATCH 12/12] KVM: MMU: micro-optimize gpte_access Paolo Bonzini
@ 2016-02-25 8:28 ` Xiao Guangrong
2016-02-25 8:49 ` Paolo Bonzini
2016-03-04 21:43 ` Paolo Bonzini
13 siblings, 1 reply; 26+ messages in thread
From: Xiao Guangrong @ 2016-02-25 8:28 UTC (permalink / raw)
To: Paolo Bonzini, linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, mtosatti
On 02/24/2016 09:17 PM, Paolo Bonzini wrote:
> This series started from looking at mmu_unsync_walk for the ubsan thread.
> Patches 1 and 2 are the result of the discussions in that thread.
>
> Patches 3 to 9 do more cleanups in __kvm_sync_page and its callers.
> Among other changes, it removes kvm_sync_page_transient and avoids
> duplicate code between __kvm_sync_page and kvm_sync_pages.
>
> I stopped where I had questions about the existing kvm_mmu_get_page
> code (see patch 8 for the question). However perhaps more cleanups
> are possible, also thanks to Takuya's work on that function and
> link_shadow_page.
>
> Patches 10 to 12 are just micro-optimizations.
>
> Guangrong, it would be great if you took a look since you know this part
> of KVM very well.
I have reviewed it and it works fine except the one leaking tlb flush out
of mmu-lock.
I will continue to simplify the path of walking unsync sp to keep
mmu_page_path smaller and make comments for kvm_mmu_get_page on top of
this patchset.
BTW, is any conflict to apply my page-tracking patchset on top of this
patchset (i noticed you've merged this patchset on kvm/queue)? Please
tell me to rebase it if it is needed.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children
2016-02-25 2:15 ` Takuya Yoshikawa
2016-02-25 7:35 ` Xiao Guangrong
@ 2016-02-25 8:46 ` Paolo Bonzini
1 sibling, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-25 8:46 UTC (permalink / raw)
To: Takuya Yoshikawa, linux-kernel, kvm; +Cc: guangrong.xiao, mtosatti
On 25/02/2016 03:15, Takuya Yoshikawa wrote:
> On 2016/02/24 22:17, Paolo Bonzini wrote:
>> Move the call to kvm_mmu_flush_or_zap outside the loop.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>> arch/x86/kvm/mmu.c | 9 ++++++---
>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 725316df32ec..6d47b5c43246 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2029,24 +2029,27 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
>> struct mmu_page_path parents;
>> struct kvm_mmu_pages pages;
>> LIST_HEAD(invalid_list);
>> + bool flush = false;
>>
>> while (mmu_unsync_walk(parent, &pages)) {
>> bool protected = false;
>> - bool flush = false;
>>
>> for_each_sp(pages, sp, parents, i)
>> protected |= rmap_write_protect(vcpu, sp->gfn);
>>
>> - if (protected)
>> + if (protected) {
>> kvm_flush_remote_tlbs(vcpu->kvm);
>> + flush = false;
>> + }
>>
>> for_each_sp(pages, sp, parents, i) {
>> flush |= kvm_sync_page(vcpu, sp, &invalid_list);
>> mmu_pages_clear_parents(&parents);
>> }
>> - kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
>> cond_resched_lock(&vcpu->kvm->mmu_lock);
>
> This may release the mmu_lock before committing the zapping.
> Is it safe? If so, we may want to see the reason in the changelog.
It should be safe; the page is already marked as invalid and hence the
role will not match in kvm_mmu_get_page.
The idea is simply that committing the zap is expensive (for example it
requires a remote TLB flush) so you want to do it as rarely as possible.
I'll note this in the commit message.
Paolo
> Takuya
>
>> }
>> +
>> + kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
>> }
>>
>> static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
>>
>
>
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 08/12] KVM: MMU: move zap/flush to kvm_mmu_get_page
2016-02-25 7:32 ` Xiao Guangrong
@ 2016-02-25 8:48 ` Paolo Bonzini
0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-25 8:48 UTC (permalink / raw)
To: Xiao Guangrong, linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, mtosatti
On 25/02/2016 08:32, Xiao Guangrong wrote:
>
> As you have already merged this patchset to the kvm tree, i will post
> a patch to comment these cases to make the code be more understandable.
I've only merged it to kvm/queue so that it gets into all my testing
(and the buildbot's). I won't move it to kvm/next until I get reviews.
Paolo
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children
2016-02-25 7:35 ` Xiao Guangrong
@ 2016-02-25 8:49 ` Paolo Bonzini
2016-02-25 9:10 ` Xiao Guangrong
0 siblings, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-25 8:49 UTC (permalink / raw)
To: Xiao Guangrong, Takuya Yoshikawa, linux-kernel, kvm; +Cc: mtosatti
On 25/02/2016 08:35, Xiao Guangrong wrote:
>> This may release the mmu_lock before committing the zapping.
>> Is it safe? If so, we may want to see the reason in the changelog.
>
> It is unsafe indeed, please do not do it.
Can you explain why? kvm_zap_obsolete_pages does the same.
Paolo
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations
2016-02-25 8:28 ` [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Xiao Guangrong
@ 2016-02-25 8:49 ` Paolo Bonzini
0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-25 8:49 UTC (permalink / raw)
To: Xiao Guangrong, linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, mtosatti
On 25/02/2016 09:28, Xiao Guangrong wrote:
>
> BTW, is any conflict to apply my page-tracking patchset on top of this
> patchset (i noticed you've merged this patchset on kvm/queue)? Please
> tell me to rebase it if it is needed.
No, there shouldn't be any conflict.
Paolo
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children
2016-02-25 8:49 ` Paolo Bonzini
@ 2016-02-25 9:10 ` Xiao Guangrong
2016-02-25 9:55 ` Paolo Bonzini
0 siblings, 1 reply; 26+ messages in thread
From: Xiao Guangrong @ 2016-02-25 9:10 UTC (permalink / raw)
To: Paolo Bonzini, Takuya Yoshikawa, linux-kernel, kvm; +Cc: mtosatti
On 02/25/2016 04:49 PM, Paolo Bonzini wrote:
>
>
> On 25/02/2016 08:35, Xiao Guangrong wrote:
>>> This may release the mmu_lock before committing the zapping.
>>> Is it safe? If so, we may want to see the reason in the changelog.
>>
>> It is unsafe indeed, please do not do it.
>
> Can you explain why? kvm_zap_obsolete_pages does the same.
It's not the same, please see the comment in kvm_mmu_invalidate_zap_all_pages:
/*
* Notify all vcpus to reload its shadow page table
* and flush TLB. Then all vcpus will switch to new
* shadow page table with the new mmu_valid_gen.
*
* Note: we should do this under the protection of
* mmu-lock, otherwise, vcpu would purge shadow page
* but miss tlb flush.
*/
kvm_reload_remote_mmus(kvm);
That means the tlb is flushed before releasing mmu-lock.
A example is in rmap_write_protect(), when KVM creates a shadow page table for
the the guest, it detects no spte pointing to the gfn, so tlb is not flushed so
that guest can freely updates its pte.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children
2016-02-25 9:10 ` Xiao Guangrong
@ 2016-02-25 9:55 ` Paolo Bonzini
0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-02-25 9:55 UTC (permalink / raw)
To: Xiao Guangrong, Takuya Yoshikawa, linux-kernel, kvm; +Cc: mtosatti
On 25/02/2016 10:10, Xiao Guangrong wrote:
>
>
> On 02/25/2016 04:49 PM, Paolo Bonzini wrote:
>>
>>
>> On 25/02/2016 08:35, Xiao Guangrong wrote:
>>>> This may release the mmu_lock before committing the zapping.
>>>> Is it safe? If so, we may want to see the reason in the changelog.
>>>
>>> It is unsafe indeed, please do not do it.
>>
>> Can you explain why? kvm_zap_obsolete_pages does the same.
>
> It's not the same, please see the comment in
> kvm_mmu_invalidate_zap_all_pages:
> /*
> * Notify all vcpus to reload its shadow page table
> * and flush TLB. Then all vcpus will switch to new
> * shadow page table with the new mmu_valid_gen.
> *
> * Note: we should do this under the protection of
> * mmu-lock, otherwise, vcpu would purge shadow page
> * but miss tlb flush.
> */
> kvm_reload_remote_mmus(kvm);
>
> That means the tlb is flushed before releasing mmu-lock.
>
> A example is in rmap_write_protect(), when KVM creates a shadow page
> table for
> the the guest, it detects no spte pointing to the gfn, so tlb is not
> flushed so
> that guest can freely updates its pte.
Then I'll do a different patch that checks need_resched||spin_needbreak,
and if so does commit+cond_resched_lock. I've removed 9/12 from
kvm/queue. Again, sorry for giving the impression that these patches
were already final.
Paolo
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations
2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
` (12 preceding siblings ...)
2016-02-25 8:28 ` [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Xiao Guangrong
@ 2016-03-04 21:43 ` Paolo Bonzini
13 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2016-03-04 21:43 UTC (permalink / raw)
To: linux-kernel, kvm; +Cc: yoshikawa_takuya_b1, guangrong.xiao, mtosatti
On 24/02/2016 14:17, Paolo Bonzini wrote:
> This series started from looking at mmu_unsync_walk for the ubsan thread.
> Patches 1 and 2 are the result of the discussions in that thread.
>
> Patches 3 to 9 do more cleanups in __kvm_sync_page and its callers.
> Among other changes, it removes kvm_sync_page_transient and avoids
> duplicate code between __kvm_sync_page and kvm_sync_pages.
>
> I stopped where I had questions about the existing kvm_mmu_get_page
> code (see patch 8 for the question). However perhaps more cleanups
> are possible, also thanks to Takuya's work on that function and
> link_shadow_page.
>
> Patches 10 to 12 are just micro-optimizations.
>
> Guangrong, it would be great if you took a look since you know this part
> of KVM very well.
>
> I have tested this series minus patch 9, and it survived installation
> of various Linux and Windows guests with EPT disabled. Of course before
> committing the patches I will retest with patch 9 included.
>
> Paolo
>
> Paolo Bonzini (11):
> KVM: MMU: Fix ubsan warnings
> KVM: MMU: introduce kvm_mmu_flush_or_zap
> KVM: MMU: move TLB flush out of __kvm_sync_page
> KVM: MMU: use kvm_sync_page in kvm_sync_pages
> KVM: MMU: cleanup __kvm_sync_page and its callers
> KVM: MMU: invert return value of FNAME(sync_page) and *kvm_sync_page*
> KVM: MMU: move zap/flush to kvm_mmu_get_page
> KVM: MMU: coalesce zapping page after mmu_sync_children
> KVM: mark memory barrier with smp_mb__after_atomic
> KVM: MMU: simplify last_pte_bitmap
> KVM: MMU: micro-optimize gpte_access
>
> Xiao Guangrong (1):
> KVM: MMU: check kvm_mmu_pages and mmu_page_path indices
>
> arch/x86/include/asm/kvm_host.h | 6 +-
> arch/x86/kvm/mmu.c | 216 ++++++++++++++++++++++------------------
> arch/x86/kvm/paging_tmpl.h | 11 +-
> virt/kvm/kvm_main.c | 2 +-
> 4 files changed, 126 insertions(+), 109 deletions(-)
>
I've pushed patches 1 and 2 to kvm/next now; I'll repost the others next
Monday since there were very small conflicts with the page tracking series.
Paolo
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2016-03-04 21:43 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-24 13:17 [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Paolo Bonzini
2016-02-24 13:17 ` [PATCH 01/12] KVM: MMU: Fix ubsan warnings Paolo Bonzini
2016-02-24 13:42 ` Mike Krinkin
2016-02-24 13:43 ` Paolo Bonzini
2016-02-24 13:17 ` [PATCH 02/12] KVM: MMU: check kvm_mmu_pages and mmu_page_path indices Paolo Bonzini
2016-02-24 13:17 ` [PATCH 03/12] KVM: MMU: introduce kvm_mmu_flush_or_zap Paolo Bonzini
2016-02-24 13:17 ` [PATCH 04/12] KVM: MMU: move TLB flush out of __kvm_sync_page Paolo Bonzini
2016-02-24 13:17 ` [PATCH 05/12] KVM: MMU: use kvm_sync_page in kvm_sync_pages Paolo Bonzini
2016-02-24 13:17 ` [PATCH 06/12] KVM: MMU: cleanup __kvm_sync_page and its callers Paolo Bonzini
2016-02-24 13:17 ` [PATCH 07/12] KVM: MMU: invert return value of FNAME(sync_page) and *kvm_sync_page* Paolo Bonzini
2016-02-24 13:17 ` [PATCH 08/12] KVM: MMU: move zap/flush to kvm_mmu_get_page Paolo Bonzini
2016-02-25 7:32 ` Xiao Guangrong
2016-02-25 8:48 ` Paolo Bonzini
2016-02-24 13:17 ` [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children Paolo Bonzini
2016-02-25 2:15 ` Takuya Yoshikawa
2016-02-25 7:35 ` Xiao Guangrong
2016-02-25 8:49 ` Paolo Bonzini
2016-02-25 9:10 ` Xiao Guangrong
2016-02-25 9:55 ` Paolo Bonzini
2016-02-25 8:46 ` Paolo Bonzini
2016-02-24 13:17 ` [PATCH 10/12] KVM: mark memory barrier with smp_mb__after_atomic Paolo Bonzini
2016-02-24 13:17 ` [PATCH 11/12] KVM: MMU: simplify last_pte_bitmap Paolo Bonzini
2016-02-24 13:17 ` [PATCH 12/12] KVM: MMU: micro-optimize gpte_access Paolo Bonzini
2016-02-25 8:28 ` [PATCH 00/12] KVM: MMU: cleanup around kvm_sync_page, and a few micro-optimizations Xiao Guangrong
2016-02-25 8:49 ` Paolo Bonzini
2016-03-04 21:43 ` Paolo Bonzini
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.