All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] KVM: x86: improve reexecute_instruction
@ 2012-11-19 23:57 Xiao Guangrong
  2012-11-19 23:58 ` [PATCH 1/3] KVM: x86: clean up reexecute_instruction Xiao Guangrong
                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Xiao Guangrong @ 2012-11-19 23:57 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, LKML, KVM

The current reexecute_instruction can not well detect the failed instruction
emulation. It allows guest to retry all the instructions except it accesses
on error pfn.

For example, these cases can not be detected:
- for tdp used
  currently, it refused to retry all instructions. If nested npt is used, the
  emulation may be caused by shadow page, it can be fixed by unshadow the
  shadow page.

- for shadow mmu
  some cases are nested-write-protect, for example, if the page we want to
  write is used as PDE but it chains to itself. Under this case, we should
  stop the emulation and report the case to userspace.

This test case based on kvm-unit-test can trigger a infinite loop on current
code (ept = 0), after this patchset, it can report the error to Qemu.

Marcelo, I am afraid this test case can not be putted on kvm-unit-test,
autotest is confused about this case since it can abort Qemu.

Subject: [PATCH] access test: test unhandleable instruction

Test the instruction which can not be handled by kvm

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 x86/access.c |   27 ++++++++++++++++++++++++++-
 1 files changed, 26 insertions(+), 1 deletions(-)

diff --git a/x86/access.c b/x86/access.c
index 23a5995..e88db6b 100644
--- a/x86/access.c
+++ b/x86/access.c
@@ -2,6 +2,7 @@
 #include "libcflat.h"
 #include "desc.h"
 #include "processor.h"
+#include "vm.h"

 #define smp_id() 0

@@ -739,6 +740,28 @@ err:
 	return 0;
 }

+static int check_retry_unhandleable_ins(ac_pool_t *pool)
+{
+	unsigned long mem = 30 * 1024 * 1024;
+	unsigned long esp;
+	ac_test_t at;
+
+	ac_test_init(&at, (void *)(0x123406003000));
+	at.flags[AC_PDE_PRESENT] = at.flags[AC_PDE_WRITABLE] = 1;
+	at.flags[AC_PTE_PRESENT] = at.flags[AC_PTE_WRITABLE] = 1;
+	at.flags[AC_CPU_CR0_WP] = 1;
+
+	at.phys = mem;
+	ac_setup_specific_pages(&at, pool, mem, 0);
+
+	asm volatile("mov %%rsp, %%rax  \n\t" : "=a"(esp));
+	asm volatile("mov %%rax, %%rsp  \n\t" : : "a"(0x123406003000 + 0xf0));
+	asm volatile ("int $0x3 \n\t");
+	asm volatile("mov %%rax, %%rsp  \n\t" : : "a"(esp));
+
+	return 1;
+}
+
 int ac_test_exec(ac_test_t *at, ac_pool_t *pool)
 {
     int r;
@@ -756,7 +779,8 @@ const ac_test_fn ac_test_cases[] =
 {
 	corrupt_hugepage_triger,
 	check_pfec_on_prefetch_pte,
-	check_smep_andnot_wp
+	check_smep_andnot_wp,
+	check_retry_unhandleable_ins
 };

 int ac_test_run(void)
@@ -770,6 +794,7 @@ int ac_test_run(void)
     tests = successes = 0;
     ac_env_int(&pool);
     ac_test_init(&at, (void *)(0x123400000000 + 16 * smp_id()));
+
     do {
 	if (at.flags[AC_CPU_CR4_SMEP] && (ptl2[2] & 0x4))
 		ptl2[2] -= 0x4;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 1/3] KVM: x86: clean up reexecute_instruction
  2012-11-19 23:57 [PATCH 0/3] KVM: x86: improve reexecute_instruction Xiao Guangrong
@ 2012-11-19 23:58 ` Xiao Guangrong
  2012-11-20 12:11   ` Gleb Natapov
  2012-11-19 23:59 ` [PATCH 2/3] KVM: x86: let reexecute_instruction work for tdp Xiao Guangrong
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 27+ messages in thread
From: Xiao Guangrong @ 2012-11-19 23:58 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

Little cleanup for reexecute_instruction, also use gpa_to_gfn in
retry_instruction

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/x86.c |   13 ++++++-------
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 52ae8b5..7be8452 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4477,19 +4477,18 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva)
 	if (tdp_enabled)
 		return false;

+	gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
+	if (gpa == UNMAPPED_GVA)
+		return true; /* let cpu generate fault */
+
 	/*
 	 * if emulation was due to access to shadowed page table
 	 * and it failed try to unshadow page and re-enter the
 	 * guest to let CPU execute the instruction.
 	 */
-	if (kvm_mmu_unprotect_page_virt(vcpu, gva))
+	if (kvm_mmu_unprotect_page(vcpu->kvm, c(gpa)))
 		return true;

-	gpa = kvm_mmu_gva_to_gpa_system(vcpu, gva, NULL);
-
-	if (gpa == UNMAPPED_GVA)
-		return true; /* let cpu generate fault */
-
 	/*
 	 * Do not retry the unhandleable instruction if it faults on the
 	 * readonly host memory, otherwise it will goto a infinite loop:
@@ -4544,7 +4543,7 @@ static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
 	if (!vcpu->arch.mmu.direct_map)
 		gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2, NULL);

-	kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
+	kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));

 	return true;
 }
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/3] KVM: x86: let reexecute_instruction work for tdp
  2012-11-19 23:57 [PATCH 0/3] KVM: x86: improve reexecute_instruction Xiao Guangrong
  2012-11-19 23:58 ` [PATCH 1/3] KVM: x86: clean up reexecute_instruction Xiao Guangrong
@ 2012-11-19 23:59 ` Xiao Guangrong
  2012-11-26 22:37   ` Marcelo Tosatti
  2012-11-19 23:59 ` [PATCH 3/3] KVM: x86: improve reexecute_instruction Xiao Guangrong
  2012-11-23  1:16 ` [PATCH 0/3] " Marcelo Tosatti
  3 siblings, 1 reply; 27+ messages in thread
From: Xiao Guangrong @ 2012-11-19 23:59 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

Currently, reexecute_instruction refused to retry all instructions. If
nested npt is used, the emulation may be caused by shadow page, it can
be fixed by dropping the shadow page

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/x86.c |   14 ++++++++------
 1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7be8452..5fe72cc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4469,17 +4469,19 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu)
 	return r;
 }

-static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva)
+static bool reexecute_instruction(struct kvm_vcpu *vcpu, unsigned long cr2)
 {
-	gpa_t gpa;
+	gpa_t gpa = cr2;
 	pfn_t pfn;

-	if (tdp_enabled)
+	if (!ACCESS_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
 		return false;

-	gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
-	if (gpa == UNMAPPED_GVA)
-		return true; /* let cpu generate fault */
+	if (!vcpu->arch.mmu.direct_map) {
+		gpa = kvm_mmu_gva_to_gpa_read(vcpu, cr2, NULL);
+		if (gpa == UNMAPPED_GVA)
+			return true; /* let cpu generate fault */
+	}

 	/*
 	 * if emulation was due to access to shadowed page table
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/3] KVM: x86: improve reexecute_instruction
  2012-11-19 23:57 [PATCH 0/3] KVM: x86: improve reexecute_instruction Xiao Guangrong
  2012-11-19 23:58 ` [PATCH 1/3] KVM: x86: clean up reexecute_instruction Xiao Guangrong
  2012-11-19 23:59 ` [PATCH 2/3] KVM: x86: let reexecute_instruction work for tdp Xiao Guangrong
@ 2012-11-19 23:59 ` Xiao Guangrong
  2012-11-26 22:41   ` Marcelo Tosatti
  2012-11-23  1:16 ` [PATCH 0/3] " Marcelo Tosatti
  3 siblings, 1 reply; 27+ messages in thread
From: Xiao Guangrong @ 2012-11-19 23:59 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

The current reexecute_instruction can not well detect the failed instruction
emulation. It allows guest to retry all the instructions except it accesses
on error pfn.

For example, some cases are nested-write-protect - if the page we want to
write is used as PDE but it chains to itself. Under this case, we should
stop the emulation and report the case to userspace.

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/include/asm/kvm_host.h |    2 +
 arch/x86/kvm/paging_tmpl.h      |    2 +
 arch/x86/kvm/x86.c              |   54 ++++++++++++++++++++++++++++-----------
 3 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b2e11f4..c5eb52f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -566,6 +566,8 @@ struct kvm_arch {
 	u64 hv_guest_os_id;
 	u64 hv_hypercall;

+	/* synchronizing reexecute_instruction and page fault path. */
+	u64 page_fault_count;
 	#ifdef CONFIG_KVM_MMU_AUDIT
 	int audit_point;
 	#endif
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 891eb6d..d55ad89 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -568,6 +568,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
 	if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
 		goto out_unlock;

+	vcpu->kvm->arch.page_fault_count++;
+
 	kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
 	kvm_mmu_free_some_pages(vcpu);
 	if (!force_pt_level)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5fe72cc..2fe484b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4473,37 +4473,61 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, unsigned long cr2)
 {
 	gpa_t gpa = cr2;
 	pfn_t pfn;
-
-	if (!ACCESS_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
-		return false;
+	u64 page_fault_count;
+	int emulate;

 	if (!vcpu->arch.mmu.direct_map) {
 		gpa = kvm_mmu_gva_to_gpa_read(vcpu, cr2, NULL);
+		/*
+		 * If the mapping is invalid in guest, let cpu retry
+		 * it to generate fault.
+		 */
 		if (gpa == UNMAPPED_GVA)
-			return true; /* let cpu generate fault */
+			return true;
 	}

 	/*
-	 * if emulation was due to access to shadowed page table
-	 * and it failed try to unshadow page and re-enter the
-	 * guest to let CPU execute the instruction.
-	 */
-	if (kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa)))
-		return true;
-
-	/*
 	 * Do not retry the unhandleable instruction if it faults on the
 	 * readonly host memory, otherwise it will goto a infinite loop:
 	 * retry instruction -> write #PF -> emulation fail -> retry
 	 * instruction -> ...
 	 */
 	pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
-	if (!is_error_noslot_pfn(pfn)) {
-		kvm_release_pfn_clean(pfn);
+
+	/*
+	 * If the instruction failed on the error pfn, it can not be fixed,
+	 * report the error to userspace.
+	 */
+	if (is_error_noslot_pfn(pfn))
+		return false;
+
+	kvm_release_pfn_clean(pfn);
+
+	/* The instructions are well-emulated on direct mmu. */
+	if (vcpu->arch.mmu.direct_map) {
+		if (ACCESS_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
+			kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
+
 		return true;
 	}

-	return false;
+again:
+	page_fault_count = ACCESS_ONCE(vcpu->kvm->arch.page_fault_count);
+
+	/*
+	 * if emulation was due to access to shadowed page table
+	 * and it failed try to unshadow page and re-enter the
+	 * guest to let CPU execute the instruction.
+	 */
+	kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
+	emulate = vcpu->arch.mmu.page_fault(vcpu, cr2, PFERR_WRITE_MASK, false);
+
+	/* The page fault path called above can increase the count. */
+	if (page_fault_count + 1 !=
+	      ACCESS_ONCE(vcpu->kvm->arch.page_fault_count))
+		goto again;
+
+	return !emulate;
 }

 static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/3] KVM: x86: clean up reexecute_instruction
  2012-11-19 23:58 ` [PATCH 1/3] KVM: x86: clean up reexecute_instruction Xiao Guangrong
@ 2012-11-20 12:11   ` Gleb Natapov
  2012-11-20 20:13     ` Xiao Guangrong
  0 siblings, 1 reply; 27+ messages in thread
From: Gleb Natapov @ 2012-11-20 12:11 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On Tue, Nov 20, 2012 at 07:58:32AM +0800, Xiao Guangrong wrote:
> Little cleanup for reexecute_instruction, also use gpa_to_gfn in
> retry_instruction
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  arch/x86/kvm/x86.c |   13 ++++++-------
>  1 files changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 52ae8b5..7be8452 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4477,19 +4477,18 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva)
>  	if (tdp_enabled)
>  		return false;
> 
> +	gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
> +	if (gpa == UNMAPPED_GVA)
> +		return true; /* let cpu generate fault */
> +
>  	/*
>  	 * if emulation was due to access to shadowed page table
>  	 * and it failed try to unshadow page and re-enter the
>  	 * guest to let CPU execute the instruction.
>  	 */
> -	if (kvm_mmu_unprotect_page_virt(vcpu, gva))
> +	if (kvm_mmu_unprotect_page(vcpu->kvm, c(gpa)))
What's c()? Should be gpa_to_gfn(gpa)?

>  		return true;
> 
> -	gpa = kvm_mmu_gva_to_gpa_system(vcpu, gva, NULL);
> -
> -	if (gpa == UNMAPPED_GVA)
> -		return true; /* let cpu generate fault */
> -
>  	/*
>  	 * Do not retry the unhandleable instruction if it faults on the
>  	 * readonly host memory, otherwise it will goto a infinite loop:
> @@ -4544,7 +4543,7 @@ static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
>  	if (!vcpu->arch.mmu.direct_map)
>  		gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2, NULL);
> 
> -	kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
> +	kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
> 
>  	return true;
>  }
> -- 
> 1.7.7.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
			Gleb.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/3] KVM: x86: clean up reexecute_instruction
  2012-11-20 12:11   ` Gleb Natapov
@ 2012-11-20 20:13     ` Xiao Guangrong
  0 siblings, 0 replies; 27+ messages in thread
From: Xiao Guangrong @ 2012-11-20 20:13 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Avi Kivity, Marcelo Tosatti, LKML, KVM

On 11/20/2012 08:11 PM, Gleb Natapov wrote:
> On Tue, Nov 20, 2012 at 07:58:32AM +0800, Xiao Guangrong wrote:
>> Little cleanup for reexecute_instruction, also use gpa_to_gfn in
>> retry_instruction
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> ---
>>  arch/x86/kvm/x86.c |   13 ++++++-------
>>  1 files changed, 6 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 52ae8b5..7be8452 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -4477,19 +4477,18 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva)
>>  	if (tdp_enabled)
>>  		return false;
>>
>> +	gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
>> +	if (gpa == UNMAPPED_GVA)
>> +		return true; /* let cpu generate fault */
>> +
>>  	/*
>>  	 * if emulation was due to access to shadowed page table
>>  	 * and it failed try to unshadow page and re-enter the
>>  	 * guest to let CPU execute the instruction.
>>  	 */
>> -	if (kvm_mmu_unprotect_page_virt(vcpu, gva))
>> +	if (kvm_mmu_unprotect_page(vcpu->kvm, c(gpa)))
> What's c()? Should be gpa_to_gfn(gpa)?

Yes. It is the stupid copy-paste error. Thanks you for pointing it out, Gleb!
This is the new one have fixed it.

Subject: [PATCH 1/3] KVM: x86: clean up reexecute_instruction

Little cleanup for reexecute_instruction, also use gpa_to_gfn in
retry_instruction

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
---
 arch/x86/kvm/x86.c |   13 ++++++-------
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 52ae8b5..7be8452 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4477,19 +4477,18 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva)
 	if (tdp_enabled)
 		return false;

+	gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
+	if (gpa == UNMAPPED_GVA)
+		return true; /* let cpu generate fault */
+
 	/*
 	 * if emulation was due to access to shadowed page table
 	 * and it failed try to unshadow page and re-enter the
 	 * guest to let CPU execute the instruction.
 	 */
-	if (kvm_mmu_unprotect_page_virt(vcpu, gva))
+	if (kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa)))
 		return true;

-	gpa = kvm_mmu_gva_to_gpa_system(vcpu, gva, NULL);
-
-	if (gpa == UNMAPPED_GVA)
-		return true; /* let cpu generate fault */
-
 	/*
 	 * Do not retry the unhandleable instruction if it faults on the
 	 * readonly host memory, otherwise it will goto a infinite loop:
@@ -4544,7 +4543,7 @@ static bool retry_instruction(struct x86_emulate_ctxt *ctxt,
 	if (!vcpu->arch.mmu.direct_map)
 		gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2, NULL);

-	kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
+	kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));

 	return true;
 }
-- 
1.7.7.6



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/3] KVM: x86: improve reexecute_instruction
  2012-11-19 23:57 [PATCH 0/3] KVM: x86: improve reexecute_instruction Xiao Guangrong
                   ` (2 preceding siblings ...)
  2012-11-19 23:59 ` [PATCH 3/3] KVM: x86: improve reexecute_instruction Xiao Guangrong
@ 2012-11-23  1:16 ` Marcelo Tosatti
  3 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2012-11-23  1:16 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On Tue, Nov 20, 2012 at 07:57:48AM +0800, Xiao Guangrong wrote:
> The current reexecute_instruction can not well detect the failed instruction
> emulation. It allows guest to retry all the instructions except it accesses
> on error pfn.
> 
> For example, these cases can not be detected:
> - for tdp used
>   currently, it refused to retry all instructions. If nested npt is used, the
>   emulation may be caused by shadow page, it can be fixed by unshadow the
>   shadow page.
> 
> - for shadow mmu
>   some cases are nested-write-protect, for example, if the page we want to
>   write is used as PDE but it chains to itself. Under this case, we should
>   stop the emulation and report the case to userspace.
> 
> This test case based on kvm-unit-test can trigger a infinite loop on current
> code (ept = 0), after this patchset, it can report the error to Qemu.
> 
> Marcelo, I am afraid this test case can not be putted on kvm-unit-test,
> autotest is confused about this case since it can abort Qemu.

That is OK, kvm-unit-test only executes tests listed at
x86/unittests.cfg.

> 
> Subject: [PATCH] access test: test unhandleable instruction
> 
> Test the instruction which can not be handled by kvm
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  x86/access.c |   27 ++++++++++++++++++++++++++-
>  1 files changed, 26 insertions(+), 1 deletions(-)
> 
> diff --git a/x86/access.c b/x86/access.c
> index 23a5995..e88db6b 100644
> --- a/x86/access.c
> +++ b/x86/access.c
> @@ -2,6 +2,7 @@
>  #include "libcflat.h"
>  #include "desc.h"
>  #include "processor.h"
> +#include "vm.h"
> 
>  #define smp_id() 0
> 
> @@ -739,6 +740,28 @@ err:
>  	return 0;
>  }
> 
> +static int check_retry_unhandleable_ins(ac_pool_t *pool)
> +{
> +	unsigned long mem = 30 * 1024 * 1024;
> +	unsigned long esp;
> +	ac_test_t at;
> +
> +	ac_test_init(&at, (void *)(0x123406003000));
> +	at.flags[AC_PDE_PRESENT] = at.flags[AC_PDE_WRITABLE] = 1;
> +	at.flags[AC_PTE_PRESENT] = at.flags[AC_PTE_WRITABLE] = 1;
> +	at.flags[AC_CPU_CR0_WP] = 1;
> +
> +	at.phys = mem;
> +	ac_setup_specific_pages(&at, pool, mem, 0);
> +
> +	asm volatile("mov %%rsp, %%rax  \n\t" : "=a"(esp));
> +	asm volatile("mov %%rax, %%rsp  \n\t" : : "a"(0x123406003000 + 0xf0));
> +	asm volatile ("int $0x3 \n\t");
> +	asm volatile("mov %%rax, %%rsp  \n\t" : : "a"(esp));
> +
> +	return 1;
> +}
> +
>  int ac_test_exec(ac_test_t *at, ac_pool_t *pool)
>  {
>      int r;
> @@ -756,7 +779,8 @@ const ac_test_fn ac_test_cases[] =
>  {
>  	corrupt_hugepage_triger,
>  	check_pfec_on_prefetch_pte,
> -	check_smep_andnot_wp
> +	check_smep_andnot_wp,
> +	check_retry_unhandleable_ins
>  };
> 
>  int ac_test_run(void)
> @@ -770,6 +794,7 @@ int ac_test_run(void)
>      tests = successes = 0;
>      ac_env_int(&pool);
>      ac_test_init(&at, (void *)(0x123400000000 + 16 * smp_id()));
> +
>      do {
>  	if (at.flags[AC_CPU_CR4_SMEP] && (ptl2[2] & 0x4))
>  		ptl2[2] -= 0x4;
> -- 
> 1.7.7.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] KVM: x86: let reexecute_instruction work for tdp
  2012-11-19 23:59 ` [PATCH 2/3] KVM: x86: let reexecute_instruction work for tdp Xiao Guangrong
@ 2012-11-26 22:37   ` Marcelo Tosatti
  2012-11-27  3:13     ` Xiao Guangrong
  0 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2012-11-26 22:37 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On Tue, Nov 20, 2012 at 07:59:10AM +0800, Xiao Guangrong wrote:
> Currently, reexecute_instruction refused to retry all instructions. If
> nested npt is used, the emulation may be caused by shadow page, it can
> be fixed by dropping the shadow page
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  arch/x86/kvm/x86.c |   14 ++++++++------
>  1 files changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7be8452..5fe72cc 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4469,17 +4469,19 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu)
>  	return r;
>  }
> 
> -static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva)
> +static bool reexecute_instruction(struct kvm_vcpu *vcpu, unsigned long cr2)
>  {
> -	gpa_t gpa;
> +	gpa_t gpa = cr2;
>  	pfn_t pfn;
> 
> -	if (tdp_enabled)
> +	if (!ACCESS_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
>  		return false;

How is indirect_shadow_pages protected? Why is ACCESS_ONCE() being used
to read it?

> -	gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
> -	if (gpa == UNMAPPED_GVA)
> -		return true; /* let cpu generate fault */
> +	if (!vcpu->arch.mmu.direct_map) {
> +		gpa = kvm_mmu_gva_to_gpa_read(vcpu, cr2, NULL);
> +		if (gpa == UNMAPPED_GVA)
> +			return true; /* let cpu generate fault */
> +	}
> 
>  	/*
>  	 * if emulation was due to access to shadowed page table
> -- 
> 1.7.7.6

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] KVM: x86: improve reexecute_instruction
  2012-11-19 23:59 ` [PATCH 3/3] KVM: x86: improve reexecute_instruction Xiao Guangrong
@ 2012-11-26 22:41   ` Marcelo Tosatti
  2012-11-27  3:30     ` Xiao Guangrong
  0 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2012-11-26 22:41 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On Tue, Nov 20, 2012 at 07:59:53AM +0800, Xiao Guangrong wrote:
> The current reexecute_instruction can not well detect the failed instruction
> emulation. It allows guest to retry all the instructions except it accesses
> on error pfn.
> 
> For example, some cases are nested-write-protect - if the page we want to
> write is used as PDE but it chains to itself. Under this case, we should
> stop the emulation and report the case to userspace.
> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    2 +
>  arch/x86/kvm/paging_tmpl.h      |    2 +
>  arch/x86/kvm/x86.c              |   54 ++++++++++++++++++++++++++++-----------
>  3 files changed, 43 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index b2e11f4..c5eb52f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -566,6 +566,8 @@ struct kvm_arch {
>  	u64 hv_guest_os_id;
>  	u64 hv_hypercall;
> 
> +	/* synchronizing reexecute_instruction and page fault path. */
> +	u64 page_fault_count;
>  	#ifdef CONFIG_KVM_MMU_AUDIT
>  	int audit_point;
>  	#endif
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 891eb6d..d55ad89 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -568,6 +568,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
>  	if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
>  		goto out_unlock;
> 
> +	vcpu->kvm->arch.page_fault_count++;
> +
>  	kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
>  	kvm_mmu_free_some_pages(vcpu);
>  	if (!force_pt_level)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5fe72cc..2fe484b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4473,37 +4473,61 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, unsigned long cr2)
>  {
>  	gpa_t gpa = cr2;
>  	pfn_t pfn;
> -
> -	if (!ACCESS_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
> -		return false;
> +	u64 page_fault_count;
> +	int emulate;
> 
>  	if (!vcpu->arch.mmu.direct_map) {
>  		gpa = kvm_mmu_gva_to_gpa_read(vcpu, cr2, NULL);
> +		/*
> +		 * If the mapping is invalid in guest, let cpu retry
> +		 * it to generate fault.
> +		 */
>  		if (gpa == UNMAPPED_GVA)
> -			return true; /* let cpu generate fault */
> +			return true;
>  	}
> 
>  	/*
> -	 * if emulation was due to access to shadowed page table
> -	 * and it failed try to unshadow page and re-enter the
> -	 * guest to let CPU execute the instruction.
> -	 */
> -	if (kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa)))
> -		return true;
> -
> -	/*
>  	 * Do not retry the unhandleable instruction if it faults on the
>  	 * readonly host memory, otherwise it will goto a infinite loop:
>  	 * retry instruction -> write #PF -> emulation fail -> retry
>  	 * instruction -> ...
>  	 */
>  	pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
> -	if (!is_error_noslot_pfn(pfn)) {
> -		kvm_release_pfn_clean(pfn);
> +
> +	/*
> +	 * If the instruction failed on the error pfn, it can not be fixed,
> +	 * report the error to userspace.
> +	 */
> +	if (is_error_noslot_pfn(pfn))
> +		return false;
> +
> +	kvm_release_pfn_clean(pfn);
> +
> +	/* The instructions are well-emulated on direct mmu. */
> +	if (vcpu->arch.mmu.direct_map) {
> +		if (ACCESS_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
> +			kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
> +
>  		return true;
>  	}
> 
> -	return false;
> +again:
> +	page_fault_count = ACCESS_ONCE(vcpu->kvm->arch.page_fault_count);
> +
> +	/*
> +	 * if emulation was due to access to shadowed page table
> +	 * and it failed try to unshadow page and re-enter the
> +	 * guest to let CPU execute the instruction.
> +	 */
> +	kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
> +	emulate = vcpu->arch.mmu.page_fault(vcpu, cr3, PFERR_WRITE_MASK, false);

Can you explain what is the objective here?

> +	/* The page fault path called above can increase the count. */
> +	if (page_fault_count + 1 !=
> +	      ACCESS_ONCE(vcpu->kvm->arch.page_fault_count))
> +		goto again;
> +
> +	return !emulate;
>  }
> 
>  static bool retry_instruction(struct x86_emulate_ctxt *ctxt,


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] KVM: x86: let reexecute_instruction work for tdp
  2012-11-26 22:37   ` Marcelo Tosatti
@ 2012-11-27  3:13     ` Xiao Guangrong
  2012-11-27 23:32       ` Marcelo Tosatti
  0 siblings, 1 reply; 27+ messages in thread
From: Xiao Guangrong @ 2012-11-27  3:13 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, LKML, KVM

On 11/27/2012 06:37 AM, Marcelo Tosatti wrote:
> On Tue, Nov 20, 2012 at 07:59:10AM +0800, Xiao Guangrong wrote:
>> Currently, reexecute_instruction refused to retry all instructions. If
>> nested npt is used, the emulation may be caused by shadow page, it can
>> be fixed by dropping the shadow page
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> ---
>>  arch/x86/kvm/x86.c |   14 ++++++++------
>>  1 files changed, 8 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 7be8452..5fe72cc 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -4469,17 +4469,19 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu)
>>  	return r;
>>  }
>>
>> -static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva)
>> +static bool reexecute_instruction(struct kvm_vcpu *vcpu, unsigned long cr2)
>>  {
>> -	gpa_t gpa;
>> +	gpa_t gpa = cr2;
>>  	pfn_t pfn;
>>
>> -	if (tdp_enabled)
>> +	if (!ACCESS_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
>>  		return false;
> 
> How is indirect_shadow_pages protected? Why is ACCESS_ONCE() being used
> to read it?

Hi Marcelo,

It is protected by mmu-lock for it only be changed when mmu-lock is hold. And
ACCESS_ONCE is used on read path avoiding magic optimization from compiler.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] KVM: x86: improve reexecute_instruction
  2012-11-26 22:41   ` Marcelo Tosatti
@ 2012-11-27  3:30     ` Xiao Guangrong
  2012-11-27 23:42       ` Marcelo Tosatti
  2012-11-28 14:12       ` Gleb Natapov
  0 siblings, 2 replies; 27+ messages in thread
From: Xiao Guangrong @ 2012-11-27  3:30 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, LKML, KVM

On 11/27/2012 06:41 AM, Marcelo Tosatti wrote:

>>
>> -	return false;
>> +again:
>> +	page_fault_count = ACCESS_ONCE(vcpu->kvm->arch.page_fault_count);
>> +
>> +	/*
>> +	 * if emulation was due to access to shadowed page table
>> +	 * and it failed try to unshadow page and re-enter the
>> +	 * guest to let CPU execute the instruction.
>> +	 */
>> +	kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
>> +	emulate = vcpu->arch.mmu.page_fault(vcpu, cr3, PFERR_WRITE_MASK, false);
> 
> Can you explain what is the objective here?
> 

Sure. :)

The instruction emulation is caused by fault access on cr3. After unprotect
the target page, we call vcpu->arch.mmu.page_fault to fix the mapping of cr3.
if it return 1, mmu can not fix the mapping, we should report the error,
otherwise it is good to return to guest and let it re-execute the instruction
again.

page_fault_count is used to avoid the race on other vcpus, since after we
unprotect the target page, other cpu can enter page fault path and let the
page be write-protected again.

This way can help us to detect all the case that mmu can not be fixed.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] KVM: x86: let reexecute_instruction work for tdp
  2012-11-27  3:13     ` Xiao Guangrong
@ 2012-11-27 23:32       ` Marcelo Tosatti
  2012-11-28  3:15         ` Xiao Guangrong
  0 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2012-11-27 23:32 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On Tue, Nov 27, 2012 at 11:13:11AM +0800, Xiao Guangrong wrote:
> >> +static bool reexecute_instruction(struct kvm_vcpu *vcpu, unsigned long cr2)
> >>  {
> >> -	gpa_t gpa;
> >> +	gpa_t gpa = cr2;
> >>  	pfn_t pfn;
> >>
> >> -	if (tdp_enabled)
> >> +	if (!ACCESS_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
> >>  		return false;
> > 
> > How is indirect_shadow_pages protected? Why is ACCESS_ONCE() being used
> > to read it?
> 
> Hi Marcelo,
> 
> It is protected by mmu-lock for it only be changed when mmu-lock is hold. And
> ACCESS_ONCE is used on read path avoiding magic optimization from compiler.

Please switch to mmu_lock protection, there is no reason to have access
to this variable locklessly - not performance critical.

For example, there is no use of barriers when modifying the variable.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] KVM: x86: improve reexecute_instruction
  2012-11-27  3:30     ` Xiao Guangrong
@ 2012-11-27 23:42       ` Marcelo Tosatti
  2012-11-28  3:33         ` Xiao Guangrong
  2012-11-28 14:12       ` Gleb Natapov
  1 sibling, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2012-11-27 23:42 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, LKML, KVM

On Tue, Nov 27, 2012 at 11:30:24AM +0800, Xiao Guangrong wrote:
> On 11/27/2012 06:41 AM, Marcelo Tosatti wrote:
> 
> >>
> >> -	return false;
> >> +again:
> >> +	page_fault_count = ACCESS_ONCE(vcpu->kvm->arch.page_fault_count);
> >> +
> >> +	/*
> >> +	 * if emulation was due to access to shadowed page table
> >> +	 * and it failed try to unshadow page and re-enter the
> >> +	 * guest to let CPU execute the instruction.
> >> +	 */
> >> +	kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
> >> +	emulate = vcpu->arch.mmu.page_fault(vcpu, cr3, PFERR_WRITE_MASK, false);
> > 
> > Can you explain what is the objective here?
> > 
> 
> Sure. :)
> 
> The instruction emulation is caused by fault access on cr3. After unprotect
> the target page, we call vcpu->arch.mmu.page_fault to fix the mapping of cr3.
> if it return 1, mmu can not fix the mapping, we should report the error,
> otherwise it is good to return to guest and let it re-execute the instruction
> again.
> 
> page_fault_count is used to avoid the race on other vcpus, since after we
> unprotect the target page, other cpu can enter page fault path and let the
> page be write-protected again.
> 
> This way can help us to detect all the case that mmu can not be fixed.

How about recording the gfn number for shadow pages that have been
shadowed in the current pagefault run? (which is cheap, compared to
shadowing these pages).

If failed instruction emulation is write to one of these gfns, then
fail.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] KVM: x86: let reexecute_instruction work for tdp
  2012-11-27 23:32       ` Marcelo Tosatti
@ 2012-11-28  3:15         ` Xiao Guangrong
  2012-11-28 14:01           ` Gleb Natapov
  0 siblings, 1 reply; 27+ messages in thread
From: Xiao Guangrong @ 2012-11-28  3:15 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, LKML, KVM

On 11/28/2012 07:32 AM, Marcelo Tosatti wrote:
> On Tue, Nov 27, 2012 at 11:13:11AM +0800, Xiao Guangrong wrote:
>>>> +static bool reexecute_instruction(struct kvm_vcpu *vcpu, unsigned long cr2)
>>>>  {
>>>> -	gpa_t gpa;
>>>> +	gpa_t gpa = cr2;
>>>>  	pfn_t pfn;
>>>>
>>>> -	if (tdp_enabled)
>>>> +	if (!ACCESS_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
>>>>  		return false;
>>>
>>> How is indirect_shadow_pages protected? Why is ACCESS_ONCE() being used
>>> to read it?
>>
>> Hi Marcelo,
>>
>> It is protected by mmu-lock for it only be changed when mmu-lock is hold. And
>> ACCESS_ONCE is used on read path avoiding magic optimization from compiler.
> 
> Please switch to mmu_lock protection, there is no reason to have access
> to this variable locklessly - not performance critical.
> 
> For example, there is no use of barriers when modifying the variable.

This is not bad, the worst case is, the direct mmu failed to unprotect the shadow
pages, (meet indirect_shadow_pages = 0, but there has shadow pages being shadowed.),
after enter to guest, we will go into reexecute_instruction again, then it will
remove shadow pages.

But, i do not have strong opinion on it, i respect your idea! :)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] KVM: x86: improve reexecute_instruction
  2012-11-27 23:42       ` Marcelo Tosatti
@ 2012-11-28  3:33         ` Xiao Guangrong
  0 siblings, 0 replies; 27+ messages in thread
From: Xiao Guangrong @ 2012-11-28  3:33 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, LKML, KVM

On 11/28/2012 07:42 AM, Marcelo Tosatti wrote:
> On Tue, Nov 27, 2012 at 11:30:24AM +0800, Xiao Guangrong wrote:
>> On 11/27/2012 06:41 AM, Marcelo Tosatti wrote:
>>
>>>>
>>>> -	return false;
>>>> +again:
>>>> +	page_fault_count = ACCESS_ONCE(vcpu->kvm->arch.page_fault_count);
>>>> +
>>>> +	/*
>>>> +	 * if emulation was due to access to shadowed page table
>>>> +	 * and it failed try to unshadow page and re-enter the
>>>> +	 * guest to let CPU execute the instruction.
>>>> +	 */
>>>> +	kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
>>>> +	emulate = vcpu->arch.mmu.page_fault(vcpu, cr3, PFERR_WRITE_MASK, false);
>>>
>>> Can you explain what is the objective here?
>>>
>>
>> Sure. :)
>>
>> The instruction emulation is caused by fault access on cr3. After unprotect
>> the target page, we call vcpu->arch.mmu.page_fault to fix the mapping of cr3.
>> if it return 1, mmu can not fix the mapping, we should report the error,
>> otherwise it is good to return to guest and let it re-execute the instruction
>> again.
>>
>> page_fault_count is used to avoid the race on other vcpus, since after we
>> unprotect the target page, other cpu can enter page fault path and let the
>> page be write-protected again.
>>
>> This way can help us to detect all the case that mmu can not be fixed.
> 
> How about recording the gfn number for shadow pages that have been
> shadowed in the current pagefault run? (which is cheap, compared to
> shadowing these pages).
> 

Marcelo,

Thanks for your idea!

If we use this way, we should cache gfns in vcpu struct.

Actually, i have considered the approach like yours, that is getting
all page tables of the guest, then to see whether the page table gfns
are contained in the target gfn. But we need changed mmu->gva_to_pfn
or introduce a new method to get page tables of the guest.

But reexecute_instruction is really the unlikely path, both of these
ways can make the mmu code more complex and/or introduce unnecessary
overload for the common cases.

it looks like the way used in this patch is the simplest and no harmful to
the core code.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] KVM: x86: let reexecute_instruction work for tdp
  2012-11-28  3:15         ` Xiao Guangrong
@ 2012-11-28 14:01           ` Gleb Natapov
  2012-11-28 14:55             ` Xiao Guangrong
  0 siblings, 1 reply; 27+ messages in thread
From: Gleb Natapov @ 2012-11-28 14:01 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, Avi Kivity, LKML, KVM

On Wed, Nov 28, 2012 at 11:15:13AM +0800, Xiao Guangrong wrote:
> On 11/28/2012 07:32 AM, Marcelo Tosatti wrote:
> > On Tue, Nov 27, 2012 at 11:13:11AM +0800, Xiao Guangrong wrote:
> >>>> +static bool reexecute_instruction(struct kvm_vcpu *vcpu, unsigned long cr2)
> >>>>  {
> >>>> -	gpa_t gpa;
> >>>> +	gpa_t gpa = cr2;
> >>>>  	pfn_t pfn;
> >>>>
> >>>> -	if (tdp_enabled)
> >>>> +	if (!ACCESS_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
> >>>>  		return false;
> >>>
> >>> How is indirect_shadow_pages protected? Why is ACCESS_ONCE() being used
> >>> to read it?
> >>
> >> Hi Marcelo,
> >>
> >> It is protected by mmu-lock for it only be changed when mmu-lock is hold. And
> >> ACCESS_ONCE is used on read path avoiding magic optimization from compiler.
> > 
> > Please switch to mmu_lock protection, there is no reason to have access
> > to this variable locklessly - not performance critical.
> > 
> > For example, there is no use of barriers when modifying the variable.
> 
> This is not bad, the worst case is, the direct mmu failed to unprotect the shadow
> pages, (meet indirect_shadow_pages = 0, but there has shadow pages being shadowed.),
> after enter to guest, we will go into reexecute_instruction again, then it will
> remove shadow pages.
> 
Isn't the same scenario can happen even with mmu lock around
indirect_shadow_pages access?

> But, i do not have strong opinion on it, i respect your idea! :)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] KVM: x86: improve reexecute_instruction
  2012-11-27  3:30     ` Xiao Guangrong
  2012-11-27 23:42       ` Marcelo Tosatti
@ 2012-11-28 14:12       ` Gleb Natapov
  2012-11-28 14:59         ` Xiao Guangrong
  1 sibling, 1 reply; 27+ messages in thread
From: Gleb Natapov @ 2012-11-28 14:12 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, Avi Kivity, LKML, KVM

On Tue, Nov 27, 2012 at 11:30:24AM +0800, Xiao Guangrong wrote:
> On 11/27/2012 06:41 AM, Marcelo Tosatti wrote:
> 
> >>
> >> -	return false;
> >> +again:
> >> +	page_fault_count = ACCESS_ONCE(vcpu->kvm->arch.page_fault_count);
> >> +
> >> +	/*
> >> +	 * if emulation was due to access to shadowed page table
> >> +	 * and it failed try to unshadow page and re-enter the
> >> +	 * guest to let CPU execute the instruction.
> >> +	 */
> >> +	kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
> >> +	emulate = vcpu->arch.mmu.page_fault(vcpu, cr3, PFERR_WRITE_MASK, false);
> > 
> > Can you explain what is the objective here?
> > 
> 
> Sure. :)
> 
> The instruction emulation is caused by fault access on cr3. After unprotect
> the target page, we call vcpu->arch.mmu.page_fault to fix the mapping of cr3.
> if it return 1, mmu can not fix the mapping, we should report the error,
> otherwise it is good to return to guest and let it re-execute the instruction
> again.
> 
> page_fault_count is used to avoid the race on other vcpus, since after we
> unprotect the target page, other cpu can enter page fault path and let the
> page be write-protected again.
> 
> This way can help us to detect all the case that mmu can not be fixed.
> 
Can you write this in a comment above vcpu->arch.mmu.page_fault()?

--
			Gleb.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] KVM: x86: let reexecute_instruction work for tdp
  2012-11-28 14:01           ` Gleb Natapov
@ 2012-11-28 14:55             ` Xiao Guangrong
  2012-11-28 22:07               ` Marcelo Tosatti
  0 siblings, 1 reply; 27+ messages in thread
From: Xiao Guangrong @ 2012-11-28 14:55 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Marcelo Tosatti, Avi Kivity, LKML, KVM

On 11/28/2012 10:01 PM, Gleb Natapov wrote:
> On Wed, Nov 28, 2012 at 11:15:13AM +0800, Xiao Guangrong wrote:
>> On 11/28/2012 07:32 AM, Marcelo Tosatti wrote:
>>> On Tue, Nov 27, 2012 at 11:13:11AM +0800, Xiao Guangrong wrote:
>>>>>> +static bool reexecute_instruction(struct kvm_vcpu *vcpu, unsigned long cr2)
>>>>>>  {
>>>>>> -	gpa_t gpa;
>>>>>> +	gpa_t gpa = cr2;
>>>>>>  	pfn_t pfn;
>>>>>>
>>>>>> -	if (tdp_enabled)
>>>>>> +	if (!ACCESS_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
>>>>>>  		return false;
>>>>>
>>>>> How is indirect_shadow_pages protected? Why is ACCESS_ONCE() being used
>>>>> to read it?
>>>>
>>>> Hi Marcelo,
>>>>
>>>> It is protected by mmu-lock for it only be changed when mmu-lock is hold. And
>>>> ACCESS_ONCE is used on read path avoiding magic optimization from compiler.
>>>
>>> Please switch to mmu_lock protection, there is no reason to have access
>>> to this variable locklessly - not performance critical.
>>>
>>> For example, there is no use of barriers when modifying the variable.
>>
>> This is not bad, the worst case is, the direct mmu failed to unprotect the shadow
>> pages, (meet indirect_shadow_pages = 0, but there has shadow pages being shadowed.),
>> after enter to guest, we will go into reexecute_instruction again, then it will
>> remove shadow pages.
>>
> Isn't the same scenario can happen even with mmu lock around
> indirect_shadow_pages access?

Hmm..., i also think it is no different. Even using mmu-lock, we can not
prevent the target pfn can not be write-protected later. Marcelo?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] KVM: x86: improve reexecute_instruction
  2012-11-28 14:12       ` Gleb Natapov
@ 2012-11-28 14:59         ` Xiao Guangrong
  2012-11-28 21:57           ` Marcelo Tosatti
  0 siblings, 1 reply; 27+ messages in thread
From: Xiao Guangrong @ 2012-11-28 14:59 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Marcelo Tosatti, Avi Kivity, LKML, KVM

On 11/28/2012 10:12 PM, Gleb Natapov wrote:
> On Tue, Nov 27, 2012 at 11:30:24AM +0800, Xiao Guangrong wrote:
>> On 11/27/2012 06:41 AM, Marcelo Tosatti wrote:
>>
>>>>
>>>> -	return false;
>>>> +again:
>>>> +	page_fault_count = ACCESS_ONCE(vcpu->kvm->arch.page_fault_count);
>>>> +
>>>> +	/*
>>>> +	 * if emulation was due to access to shadowed page table
>>>> +	 * and it failed try to unshadow page and re-enter the
>>>> +	 * guest to let CPU execute the instruction.
>>>> +	 */
>>>> +	kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
>>>> +	emulate = vcpu->arch.mmu.page_fault(vcpu, cr3, PFERR_WRITE_MASK, false);
>>>
>>> Can you explain what is the objective here?
>>>
>>
>> Sure. :)
>>
>> The instruction emulation is caused by fault access on cr3. After unprotect
>> the target page, we call vcpu->arch.mmu.page_fault to fix the mapping of cr3.
>> if it return 1, mmu can not fix the mapping, we should report the error,
>> otherwise it is good to return to guest and let it re-execute the instruction
>> again.
>>
>> page_fault_count is used to avoid the race on other vcpus, since after we
>> unprotect the target page, other cpu can enter page fault path and let the
>> page be write-protected again.
>>
>> This way can help us to detect all the case that mmu can not be fixed.
>>
> Can you write this in a comment above vcpu->arch.mmu.page_fault()?

Okay, if Marcelo does not object this way. :)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] KVM: x86: improve reexecute_instruction
  2012-11-28 14:59         ` Xiao Guangrong
@ 2012-11-28 21:57           ` Marcelo Tosatti
  2012-11-28 22:40             ` Xiao Guangrong
  0 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2012-11-28 21:57 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Gleb Natapov, Avi Kivity, LKML, KVM

On Wed, Nov 28, 2012 at 10:59:35PM +0800, Xiao Guangrong wrote:
> On 11/28/2012 10:12 PM, Gleb Natapov wrote:
> > On Tue, Nov 27, 2012 at 11:30:24AM +0800, Xiao Guangrong wrote:
> >> On 11/27/2012 06:41 AM, Marcelo Tosatti wrote:
> >>
> >>>>
> >>>> -	return false;
> >>>> +again:
> >>>> +	page_fault_count = ACCESS_ONCE(vcpu->kvm->arch.page_fault_count);
> >>>> +
> >>>> +	/*
> >>>> +	 * if emulation was due to access to shadowed page table
> >>>> +	 * and it failed try to unshadow page and re-enter the
> >>>> +	 * guest to let CPU execute the instruction.
> >>>> +	 */
> >>>> +	kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
> >>>> +	emulate = vcpu->arch.mmu.page_fault(vcpu, cr3, PFERR_WRITE_MASK, false);
> >>>
> >>> Can you explain what is the objective here?
> >>>
> >>
> >> Sure. :)
> >>
> >> The instruction emulation is caused by fault access on cr3. After unprotect
> >> the target page, we call vcpu->arch.mmu.page_fault to fix the mapping of cr3.
> >> if it return 1, mmu can not fix the mapping, we should report the error,
> >> otherwise it is good to return to guest and let it re-execute the instruction
> >> again.
> >>
> >> page_fault_count is used to avoid the race on other vcpus, since after we
> >> unprotect the target page, other cpu can enter page fault path and let the
> >> page be write-protected again.
> >>
> >> This way can help us to detect all the case that mmu can not be fixed.
> >>
> > Can you write this in a comment above vcpu->arch.mmu.page_fault()?
> 
> Okay, if Marcelo does not object this way. :)

I do object, since it is possible to detect precisely the condition by 
storing which gfns have been cached.

Then, Xiao, you need a way to handle large read-only sptes.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] KVM: x86: let reexecute_instruction work for tdp
  2012-11-28 14:55             ` Xiao Guangrong
@ 2012-11-28 22:07               ` Marcelo Tosatti
  0 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2012-11-28 22:07 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Gleb Natapov, Avi Kivity, LKML, KVM

On Wed, Nov 28, 2012 at 10:55:26PM +0800, Xiao Guangrong wrote:
> On 11/28/2012 10:01 PM, Gleb Natapov wrote:
> > On Wed, Nov 28, 2012 at 11:15:13AM +0800, Xiao Guangrong wrote:
> >> On 11/28/2012 07:32 AM, Marcelo Tosatti wrote:
> >>> On Tue, Nov 27, 2012 at 11:13:11AM +0800, Xiao Guangrong wrote:
> >>>>>> +static bool reexecute_instruction(struct kvm_vcpu *vcpu, unsigned long cr2)
> >>>>>>  {
> >>>>>> -	gpa_t gpa;
> >>>>>> +	gpa_t gpa = cr2;
> >>>>>>  	pfn_t pfn;
> >>>>>>
> >>>>>> -	if (tdp_enabled)
> >>>>>> +	if (!ACCESS_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
> >>>>>>  		return false;
> >>>>>
> >>>>> How is indirect_shadow_pages protected? Why is ACCESS_ONCE() being used
> >>>>> to read it?
> >>>>
> >>>> Hi Marcelo,
> >>>>
> >>>> It is protected by mmu-lock for it only be changed when mmu-lock is hold. And
> >>>> ACCESS_ONCE is used on read path avoiding magic optimization from compiler.
> >>>
> >>> Please switch to mmu_lock protection, there is no reason to have access
> >>> to this variable locklessly - not performance critical.
> >>>
> >>> For example, there is no use of barriers when modifying the variable.
> >>
> >> This is not bad, the worst case is, the direct mmu failed to unprotect the shadow
> >> pages, (meet indirect_shadow_pages = 0, but there has shadow pages being shadowed.),
> >> after enter to guest, we will go into reexecute_instruction again, then it will
> >> remove shadow pages.
> >>
> > Isn't the same scenario can happen even with mmu lock around
> > indirect_shadow_pages access?
> 
> Hmm..., i also think it is no different. Even using mmu-lock, we can not
> prevent the target pfn can not be write-protected later. Marcelo?

In this particular case, it appears to be harmless (unsure if
kvm_mmu_pte_write one is safe). The point is, lockless access should not
be special.

Lockless access must be carefully documented (access protocol to
variables documented, all possible cases listed), and done when
necessary due to performance. Otherwise, don't do it.

On this case, its not necessary.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] KVM: x86: improve reexecute_instruction
  2012-11-28 21:57           ` Marcelo Tosatti
@ 2012-11-28 22:40             ` Xiao Guangrong
  2012-11-28 23:16               ` Xiao Guangrong
  2012-11-29  0:21               ` Marcelo Tosatti
  0 siblings, 2 replies; 27+ messages in thread
From: Xiao Guangrong @ 2012-11-28 22:40 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Gleb Natapov, Avi Kivity, LKML, KVM

On 11/29/2012 05:57 AM, Marcelo Tosatti wrote:
> On Wed, Nov 28, 2012 at 10:59:35PM +0800, Xiao Guangrong wrote:
>> On 11/28/2012 10:12 PM, Gleb Natapov wrote:
>>> On Tue, Nov 27, 2012 at 11:30:24AM +0800, Xiao Guangrong wrote:
>>>> On 11/27/2012 06:41 AM, Marcelo Tosatti wrote:
>>>>
>>>>>>
>>>>>> -	return false;
>>>>>> +again:
>>>>>> +	page_fault_count = ACCESS_ONCE(vcpu->kvm->arch.page_fault_count);
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * if emulation was due to access to shadowed page table
>>>>>> +	 * and it failed try to unshadow page and re-enter the
>>>>>> +	 * guest to let CPU execute the instruction.
>>>>>> +	 */
>>>>>> +	kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
>>>>>> +	emulate = vcpu->arch.mmu.page_fault(vcpu, cr3, PFERR_WRITE_MASK, false);
>>>>>
>>>>> Can you explain what is the objective here?
>>>>>
>>>>
>>>> Sure. :)
>>>>
>>>> The instruction emulation is caused by fault access on cr3. After unprotect
>>>> the target page, we call vcpu->arch.mmu.page_fault to fix the mapping of cr3.
>>>> if it return 1, mmu can not fix the mapping, we should report the error,
>>>> otherwise it is good to return to guest and let it re-execute the instruction
>>>> again.
>>>>
>>>> page_fault_count is used to avoid the race on other vcpus, since after we
>>>> unprotect the target page, other cpu can enter page fault path and let the
>>>> page be write-protected again.
>>>>
>>>> This way can help us to detect all the case that mmu can not be fixed.
>>>>
>>> Can you write this in a comment above vcpu->arch.mmu.page_fault()?
>>
>> Okay, if Marcelo does not object this way. :)
> 
> I do object, since it is possible to detect precisely the condition by 
> storing which gfns have been cached.
> 
> Then, Xiao, you need a way to handle large read-only sptes.

Sorry, Marcelo, i am still confused why read-only sptes can not work
under this patch?

The code after read-only large spte is is:

+		if ((level > PT_PAGE_TABLE_LEVEL &&
+		   has_wrprotected_page(vcpu->kvm, gfn, level)) ||
+		      mmu_need_write_protect(vcpu, gfn, can_unsync)) {
 			pgprintk("%s: found shadow page for %llx, marking ro\n",
 				 __func__, gfn);
 			ret = 1;

It return 1, then reexecute_instruction return 0. It is the same as without
readonly large-spte.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] KVM: x86: improve reexecute_instruction
  2012-11-28 22:40             ` Xiao Guangrong
@ 2012-11-28 23:16               ` Xiao Guangrong
  2012-11-29  0:23                 ` Marcelo Tosatti
  2012-11-29  0:21               ` Marcelo Tosatti
  1 sibling, 1 reply; 27+ messages in thread
From: Xiao Guangrong @ 2012-11-28 23:16 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Marcelo Tosatti, Gleb Natapov, Avi Kivity, LKML, KVM

On 11/29/2012 06:40 AM, Xiao Guangrong wrote:
> On 11/29/2012 05:57 AM, Marcelo Tosatti wrote:
>> On Wed, Nov 28, 2012 at 10:59:35PM +0800, Xiao Guangrong wrote:
>>> On 11/28/2012 10:12 PM, Gleb Natapov wrote:
>>>> On Tue, Nov 27, 2012 at 11:30:24AM +0800, Xiao Guangrong wrote:
>>>>> On 11/27/2012 06:41 AM, Marcelo Tosatti wrote:
>>>>>
>>>>>>>
>>>>>>> -	return false;
>>>>>>> +again:
>>>>>>> +	page_fault_count = ACCESS_ONCE(vcpu->kvm->arch.page_fault_count);
>>>>>>> +
>>>>>>> +	/*
>>>>>>> +	 * if emulation was due to access to shadowed page table
>>>>>>> +	 * and it failed try to unshadow page and re-enter the
>>>>>>> +	 * guest to let CPU execute the instruction.
>>>>>>> +	 */
>>>>>>> +	kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
>>>>>>> +	emulate = vcpu->arch.mmu.page_fault(vcpu, cr3, PFERR_WRITE_MASK, false);
>>>>>>
>>>>>> Can you explain what is the objective here?
>>>>>>
>>>>>
>>>>> Sure. :)
>>>>>
>>>>> The instruction emulation is caused by fault access on cr3. After unprotect
>>>>> the target page, we call vcpu->arch.mmu.page_fault to fix the mapping of cr3.
>>>>> if it return 1, mmu can not fix the mapping, we should report the error,
>>>>> otherwise it is good to return to guest and let it re-execute the instruction
>>>>> again.
>>>>>
>>>>> page_fault_count is used to avoid the race on other vcpus, since after we
>>>>> unprotect the target page, other cpu can enter page fault path and let the
>>>>> page be write-protected again.
>>>>>
>>>>> This way can help us to detect all the case that mmu can not be fixed.
>>>>>
>>>> Can you write this in a comment above vcpu->arch.mmu.page_fault()?
>>>
>>> Okay, if Marcelo does not object this way. :)
>>
>> I do object, since it is possible to detect precisely the condition by 
>> storing which gfns have been cached.
>>
>> Then, Xiao, you need a way to handle large read-only sptes.
> 
> Sorry, Marcelo, i am still confused why read-only sptes can not work
> under this patch?
> 
> The code after read-only large spte is is:
> 
> +		if ((level > PT_PAGE_TABLE_LEVEL &&
> +		   has_wrprotected_page(vcpu->kvm, gfn, level)) ||
> +		      mmu_need_write_protect(vcpu, gfn, can_unsync)) {
>  			pgprintk("%s: found shadow page for %llx, marking ro\n",
>  				 __func__, gfn);
>  			ret = 1;
> 
> It return 1, then reexecute_instruction return 0. It is the same as without
> readonly large-spte.

Ah, wait, There is a case, the large page located at 0-2M, the 0-4K is used as a
page-table (e.g. PDE), and the guest want to write the memory located at 5K which
should be freely written. This patch can return 0 for both current code and readonly
large spte.

I need to think it more.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] KVM: x86: improve reexecute_instruction
  2012-11-28 22:40             ` Xiao Guangrong
  2012-11-28 23:16               ` Xiao Guangrong
@ 2012-11-29  0:21               ` Marcelo Tosatti
  2012-12-03  8:33                 ` Xiao Guangrong
  1 sibling, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2012-11-29  0:21 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Gleb Natapov, Avi Kivity, LKML, KVM

On Thu, Nov 29, 2012 at 06:40:51AM +0800, Xiao Guangrong wrote:
> On 11/29/2012 05:57 AM, Marcelo Tosatti wrote:
> > On Wed, Nov 28, 2012 at 10:59:35PM +0800, Xiao Guangrong wrote:
> >> On 11/28/2012 10:12 PM, Gleb Natapov wrote:
> >>> On Tue, Nov 27, 2012 at 11:30:24AM +0800, Xiao Guangrong wrote:
> >>>> On 11/27/2012 06:41 AM, Marcelo Tosatti wrote:
> >>>>
> >>>>>>
> >>>>>> -	return false;
> >>>>>> +again:
> >>>>>> +	page_fault_count = ACCESS_ONCE(vcpu->kvm->arch.page_fault_count);
> >>>>>> +
> >>>>>> +	/*
> >>>>>> +	 * if emulation was due to access to shadowed page table
> >>>>>> +	 * and it failed try to unshadow page and re-enter the
> >>>>>> +	 * guest to let CPU execute the instruction.
> >>>>>> +	 */
> >>>>>> +	kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
> >>>>>> +	emulate = vcpu->arch.mmu.page_fault(vcpu, cr3, PFERR_WRITE_MASK, false);
> >>>>>
> >>>>> Can you explain what is the objective here?
> >>>>>
> >>>>
> >>>> Sure. :)
> >>>>
> >>>> The instruction emulation is caused by fault access on cr3. After unprotect
> >>>> the target page, we call vcpu->arch.mmu.page_fault to fix the mapping of cr3.
> >>>> if it return 1, mmu can not fix the mapping, we should report the error,
> >>>> otherwise it is good to return to guest and let it re-execute the instruction
> >>>> again.
> >>>>
> >>>> page_fault_count is used to avoid the race on other vcpus, since after we
> >>>> unprotect the target page, other cpu can enter page fault path and let the
> >>>> page be write-protected again.
> >>>>
> >>>> This way can help us to detect all the case that mmu can not be fixed.
> >>>>
> >>> Can you write this in a comment above vcpu->arch.mmu.page_fault()?
> >>
> >> Okay, if Marcelo does not object this way. :)
> > 
> > I do object, since it is possible to detect precisely the condition by 
> > storing which gfns have been cached.
> > 
> > Then, Xiao, you need a way to handle large read-only sptes.
> 
> Sorry, Marcelo, i am still confused why read-only sptes can not work
> under this patch?
> 
> The code after read-only large spte is is:
> 
> +		if ((level > PT_PAGE_TABLE_LEVEL &&
> +		   has_wrprotected_page(vcpu->kvm, gfn, level)) ||
> +		      mmu_need_write_protect(vcpu, gfn, can_unsync)) {
>  			pgprintk("%s: found shadow page for %llx, marking ro\n",
>  				 __func__, gfn);
>  			ret = 1;
> 
> It return 1, then reexecute_instruction return 0. It is the same as without
> readonly large-spte.

https://lkml.org/lkml/2012/11/17/75

Does unshadowing work with large sptes at reexecute_instruction? That
is, do we nuke any large read-only sptes that might be causing a certain
gfn to be read-only?

That is, following the sequence there, is the large read-only spte
properly destroyed?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] KVM: x86: improve reexecute_instruction
  2012-11-28 23:16               ` Xiao Guangrong
@ 2012-11-29  0:23                 ` Marcelo Tosatti
  0 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2012-11-29  0:23 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Gleb Natapov, Avi Kivity, LKML, KVM

On Thu, Nov 29, 2012 at 07:16:50AM +0800, Xiao Guangrong wrote:
> On 11/29/2012 06:40 AM, Xiao Guangrong wrote:
> > On 11/29/2012 05:57 AM, Marcelo Tosatti wrote:
> >> On Wed, Nov 28, 2012 at 10:59:35PM +0800, Xiao Guangrong wrote:
> >>> On 11/28/2012 10:12 PM, Gleb Natapov wrote:
> >>>> On Tue, Nov 27, 2012 at 11:30:24AM +0800, Xiao Guangrong wrote:
> >>>>> On 11/27/2012 06:41 AM, Marcelo Tosatti wrote:
> >>>>>
> >>>>>>>
> >>>>>>> -	return false;
> >>>>>>> +again:
> >>>>>>> +	page_fault_count = ACCESS_ONCE(vcpu->kvm->arch.page_fault_count);
> >>>>>>> +
> >>>>>>> +	/*
> >>>>>>> +	 * if emulation was due to access to shadowed page table
> >>>>>>> +	 * and it failed try to unshadow page and re-enter the
> >>>>>>> +	 * guest to let CPU execute the instruction.
> >>>>>>> +	 */
> >>>>>>> +	kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa));
> >>>>>>> +	emulate = vcpu->arch.mmu.page_fault(vcpu, cr3, PFERR_WRITE_MASK, false);
> >>>>>>
> >>>>>> Can you explain what is the objective here?
> >>>>>>
> >>>>>
> >>>>> Sure. :)
> >>>>>
> >>>>> The instruction emulation is caused by fault access on cr3. After unprotect
> >>>>> the target page, we call vcpu->arch.mmu.page_fault to fix the mapping of cr3.
> >>>>> if it return 1, mmu can not fix the mapping, we should report the error,
> >>>>> otherwise it is good to return to guest and let it re-execute the instruction
> >>>>> again.
> >>>>>
> >>>>> page_fault_count is used to avoid the race on other vcpus, since after we
> >>>>> unprotect the target page, other cpu can enter page fault path and let the
> >>>>> page be write-protected again.
> >>>>>
> >>>>> This way can help us to detect all the case that mmu can not be fixed.
> >>>>>
> >>>> Can you write this in a comment above vcpu->arch.mmu.page_fault()?
> >>>
> >>> Okay, if Marcelo does not object this way. :)
> >>
> >> I do object, since it is possible to detect precisely the condition by 
> >> storing which gfns have been cached.
> >>
> >> Then, Xiao, you need a way to handle large read-only sptes.
> > 
> > Sorry, Marcelo, i am still confused why read-only sptes can not work
> > under this patch?
> > 
> > The code after read-only large spte is is:
> > 
> > +		if ((level > PT_PAGE_TABLE_LEVEL &&
> > +		   has_wrprotected_page(vcpu->kvm, gfn, level)) ||
> > +		      mmu_need_write_protect(vcpu, gfn, can_unsync)) {
> >  			pgprintk("%s: found shadow page for %llx, marking ro\n",
> >  				 __func__, gfn);
> >  			ret = 1;
> > 
> > It return 1, then reexecute_instruction return 0. It is the same as without
> > readonly large-spte.
> 
> Ah, wait, There is a case, the large page located at 0-2M, the 0-4K is used as a
> page-table (e.g. PDE), and the guest want to write the memory located at 5K which
> should be freely written. This patch can return 0 for both current code and readonly
> large spte.

Yes, should remove the read-only large spte if any write to 0-2M fails
(said 'unshadow' in the previous email but the correct is 'remove large
spte in range').

> I need to think it more.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] KVM: x86: improve reexecute_instruction
  2012-11-29  0:21               ` Marcelo Tosatti
@ 2012-12-03  8:33                 ` Xiao Guangrong
  2012-12-03 19:47                   ` Marcelo Tosatti
  0 siblings, 1 reply; 27+ messages in thread
From: Xiao Guangrong @ 2012-12-03  8:33 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Gleb Natapov, Avi Kivity, LKML, KVM

Hi Marcelo,

Thanks for your patience. I was reading your reply over and over again, i would
like to argue it more :).
Please see below.

On 11/29/2012 08:21 AM, Marcelo Tosatti wrote:

> 
> https://lkml.org/lkml/2012/11/17/75
> 
> Does unshadowing work with large sptes at reexecute_instruction? That
> is, do we nuke any large read-only sptes that might be causing a certain
> gfn to be read-only?
> 
> That is, following the sequence there, is the large read-only spte
> properly destroyed?

Actually, sptes can not prevent gfn becoming writable, that means the gfn
can become writable *even if* it exists a spte which maps to the gfn.

The condition that can prevent gfn becoming writable is, the gfn has been
shadowed, that means the gfn can not become writable if it exists a sp
with sp.gfn = gfn.

Note, drop_spte does not remove any sp. So, either destroying spte or keeping
spte doest not have any affect for gfn write-protection.

If luck enough, my point is right, the current code can do some optimizations
as below:

                if (level > PT_PAGE_TABLE_LEVEL &&
-                   has_wrprotected_page(vcpu->kvm, gfn, level)) {
-                       ret = 1;
-                       drop_spte(vcpu->kvm, sptep);
-                       goto done;
-               }
+                   has_wrprotected_page(vcpu->kvm, gfn, level))
+                       return 0;


1): we can return 0 instead of 1 to avoid unnecessary emulation. vcpu will refault
    again then kvm will use small page.

2): need not do any change on the spte.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] KVM: x86: improve reexecute_instruction
  2012-12-03  8:33                 ` Xiao Guangrong
@ 2012-12-03 19:47                   ` Marcelo Tosatti
  0 siblings, 0 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2012-12-03 19:47 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Gleb Natapov, Avi Kivity, LKML, KVM

On Mon, Dec 03, 2012 at 04:33:01PM +0800, Xiao Guangrong wrote:
> Hi Marcelo,
> 
> Thanks for your patience. I was reading your reply over and over again, i would
> like to argue it more :).
> Please see below.
> 
> On 11/29/2012 08:21 AM, Marcelo Tosatti wrote:
> 
> > 
> > https://lkml.org/lkml/2012/11/17/75
> > 
> > Does unshadowing work with large sptes at reexecute_instruction? That
> > is, do we nuke any large read-only sptes that might be causing a certain
> > gfn to be read-only?
> > 
> > That is, following the sequence there, is the large read-only spte
> > properly destroyed?
> 
> Actually, sptes can not prevent gfn becoming writable, that means the gfn
> can become writable *even if* it exists a spte which maps to the gfn.
> 
> The condition that can prevent gfn becoming writable is, the gfn has been
> shadowed, that means the gfn can not become writable if it exists a sp
> with sp.gfn = gfn.
> 
> Note, drop_spte does not remove any sp. So, either destroying spte or keeping
> spte doest not have any affect for gfn write-protection.
> 
> If luck enough, my point is right, the current code can do some optimizations
> as below:
> 
>                 if (level > PT_PAGE_TABLE_LEVEL &&
> -                   has_wrprotected_page(vcpu->kvm, gfn, level)) {
> -                       ret = 1;
> -                       drop_spte(vcpu->kvm, sptep);
> -                       goto done;
> -               }
> +                   has_wrprotected_page(vcpu->kvm, gfn, level))
> +                       return 0;
> 
> 
> 1): we can return 0 instead of 1 to avoid unnecessary emulation. vcpu will refault
>     again then kvm will use small page.

So on refault the large spte is nuked. That works, yes.

> 2): need not do any change on the spte.


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2012-12-03 19:47 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-19 23:57 [PATCH 0/3] KVM: x86: improve reexecute_instruction Xiao Guangrong
2012-11-19 23:58 ` [PATCH 1/3] KVM: x86: clean up reexecute_instruction Xiao Guangrong
2012-11-20 12:11   ` Gleb Natapov
2012-11-20 20:13     ` Xiao Guangrong
2012-11-19 23:59 ` [PATCH 2/3] KVM: x86: let reexecute_instruction work for tdp Xiao Guangrong
2012-11-26 22:37   ` Marcelo Tosatti
2012-11-27  3:13     ` Xiao Guangrong
2012-11-27 23:32       ` Marcelo Tosatti
2012-11-28  3:15         ` Xiao Guangrong
2012-11-28 14:01           ` Gleb Natapov
2012-11-28 14:55             ` Xiao Guangrong
2012-11-28 22:07               ` Marcelo Tosatti
2012-11-19 23:59 ` [PATCH 3/3] KVM: x86: improve reexecute_instruction Xiao Guangrong
2012-11-26 22:41   ` Marcelo Tosatti
2012-11-27  3:30     ` Xiao Guangrong
2012-11-27 23:42       ` Marcelo Tosatti
2012-11-28  3:33         ` Xiao Guangrong
2012-11-28 14:12       ` Gleb Natapov
2012-11-28 14:59         ` Xiao Guangrong
2012-11-28 21:57           ` Marcelo Tosatti
2012-11-28 22:40             ` Xiao Guangrong
2012-11-28 23:16               ` Xiao Guangrong
2012-11-29  0:23                 ` Marcelo Tosatti
2012-11-29  0:21               ` Marcelo Tosatti
2012-12-03  8:33                 ` Xiao Guangrong
2012-12-03 19:47                   ` Marcelo Tosatti
2012-11-23  1:16 ` [PATCH 0/3] " Marcelo Tosatti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.