All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] ksm support for kvm
@ 2009-09-10 16:38 Izik Eidus
  2009-09-10 16:38 ` [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages Izik Eidus
  0 siblings, 1 reply; 24+ messages in thread
From: Izik Eidus @ 2009-09-10 16:38 UTC (permalink / raw)
  To: avi; +Cc: kvm, aarcange

Hi,

The following seires add ksm support to the kvm mmu.

Thanks.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages
  2009-09-10 16:38 [PATCH 0/3] ksm support for kvm Izik Eidus
@ 2009-09-10 16:38 ` Izik Eidus
  2009-09-10 16:38   ` [PATCH 2/3] add SPTE_HOST_WRITEABLE flag to the shadow ptes Izik Eidus
  0 siblings, 1 reply; 24+ messages in thread
From: Izik Eidus @ 2009-09-10 16:38 UTC (permalink / raw)
  To: avi; +Cc: kvm, aarcange, Izik Eidus

When using mmu notifiers, we are allowed to remove the page count
reference tooken by get_user_pages to a specific page that is mapped
inside the shadow page tables.

This is needed so we can balance the pagecount against mapcount
checking.

(Right now kvm increase the pagecount and does not increase the
mapcount when mapping page into shadow page table entry,
so when comparing pagecount against mapcount, you have no
reliable result.)

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 arch/x86/kvm/mmu.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f76d086..62d2f86 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -634,9 +634,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
 	if (*spte & shadow_accessed_mask)
 		kvm_set_pfn_accessed(pfn);
 	if (is_writeble_pte(*spte))
-		kvm_release_pfn_dirty(pfn);
-	else
-		kvm_release_pfn_clean(pfn);
+		kvm_set_pfn_dirty(pfn);
 	rmapp = gfn_to_rmap(kvm, sp->gfns[spte - sp->spt], sp->role.level);
 	if (!*rmapp) {
 		printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte);
@@ -1877,8 +1875,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	page_header_update_slot(vcpu->kvm, sptep, gfn);
 	if (!was_rmapped) {
 		rmap_count = rmap_add(vcpu, sptep, gfn);
-		if (!is_rmap_spte(*sptep))
-			kvm_release_pfn_clean(pfn);
+		kvm_release_pfn_clean(pfn);
 		if (rmap_count > RMAP_RECYCLE_THRESHOLD)
 			rmap_recycle(vcpu, sptep, gfn);
 	} else {
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/3] add SPTE_HOST_WRITEABLE flag to the shadow ptes
  2009-09-10 16:38 ` [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages Izik Eidus
@ 2009-09-10 16:38   ` Izik Eidus
  2009-09-10 16:38     ` [PATCH 3/3] add support for change_pte mmu notifiers Izik Eidus
                       ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Izik Eidus @ 2009-09-10 16:38 UTC (permalink / raw)
  To: avi; +Cc: kvm, aarcange, Izik Eidus

this flag notify that the host physical page we are pointing to from
the spte is write protected, and therefore we cant change its access
to be write unless we run get_user_pages(write = 1).

(this is needed for change_pte support in kvm)

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 arch/x86/kvm/mmu.c         |   15 +++++++++++----
 arch/x86/kvm/paging_tmpl.h |   18 +++++++++++++++---
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 62d2f86..a7151b8 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -156,6 +156,8 @@ module_param(oos_shadow, bool, 0644);
 #define CREATE_TRACE_POINTS
 #include "mmutrace.h"
 
+#define SPTE_HOST_WRITEABLE (1ULL << PT_FIRST_AVAIL_BITS_SHIFT)
+
 #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
 
 struct kvm_rmap_desc {
@@ -1754,7 +1756,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		    unsigned pte_access, int user_fault,
 		    int write_fault, int dirty, int level,
 		    gfn_t gfn, pfn_t pfn, bool speculative,
-		    bool can_unsync)
+		    bool can_unsync, bool reset_host_protection)
 {
 	u64 spte;
 	int ret = 0;
@@ -1781,6 +1783,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
 			kvm_is_mmio_pfn(pfn));
 
+	if (reset_host_protection)
+		spte |= SPTE_HOST_WRITEABLE;
+
 	spte |= (u64)pfn << PAGE_SHIFT;
 
 	if ((pte_access & ACC_WRITE_MASK)
@@ -1826,7 +1831,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 			 unsigned pt_access, unsigned pte_access,
 			 int user_fault, int write_fault, int dirty,
 			 int *ptwrite, int level, gfn_t gfn,
-			 pfn_t pfn, bool speculative)
+			 pfn_t pfn, bool speculative,
+			 bool reset_host_protection)
 {
 	int was_rmapped = 0;
 	int was_writeble = is_writeble_pte(*sptep);
@@ -1858,7 +1864,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	}
 
 	if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault,
-		      dirty, level, gfn, pfn, speculative, true)) {
+		      dirty, level, gfn, pfn, speculative, true,
+		      reset_host_protection)) {
 		if (write_fault)
 			*ptwrite = 1;
 		kvm_x86_ops->tlb_flush(vcpu);
@@ -1906,7 +1913,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
 		if (iterator.level == level) {
 			mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, ACC_ALL,
 				     0, write, 1, &pt_write,
-				     level, gfn, pfn, false);
+				     level, gfn, pfn, false, true);
 			++vcpu->stat.pf_fixed;
 			break;
 		}
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index d2fec9c..c9256ee 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -273,9 +273,13 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
 	if (mmu_notifier_retry(vcpu, vcpu->arch.update_pte.mmu_seq))
 		return;
 	kvm_get_pfn(pfn);
+ 	/*
+ 	 * we call mmu_set_spte() with reset_host_protection = true beacuse that
+ 	 * vcpu->arch.update_pte.pfn was fetched from get_user_pages(write = 1).
+ 	 */
 	mmu_set_spte(vcpu, spte, page->role.access, pte_access, 0, 0,
 		     gpte & PT_DIRTY_MASK, NULL, PT_PAGE_TABLE_LEVEL,
-		     gpte_to_gfn(gpte), pfn, true);
+		     gpte_to_gfn(gpte), pfn, true, true);
 }
 
 /*
@@ -308,7 +312,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 				     user_fault, write_fault,
 				     gw->ptes[gw->level-1] & PT_DIRTY_MASK,
 				     ptwrite, level,
-				     gw->gfn, pfn, false);
+				     gw->gfn, pfn, false, true);
 			break;
 		}
 
@@ -558,6 +562,7 @@ static void FNAME(prefetch_page)(struct kvm_vcpu *vcpu,
 static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
 	int i, offset, nr_present;
+        bool reset_host_protection;
 
 	offset = nr_present = 0;
 
@@ -595,9 +600,16 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 
 		nr_present++;
 		pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
+		if (!(sp->spt[i] & SPTE_HOST_WRITEABLE)) {
+ 			pte_access &= ~PT_WRITABLE_MASK;
+			reset_host_protection = 0;
+		} else {
+			reset_host_protection = 1;
+		}
 		set_spte(vcpu, &sp->spt[i], pte_access, 0, 0,
 			 is_dirty_gpte(gpte), PT_PAGE_TABLE_LEVEL, gfn,
-			 spte_to_pfn(sp->spt[i]), true, false);
+			 spte_to_pfn(sp->spt[i]), true, false,
+			 reset_host_protection);
 	}
 
 	return !nr_present;
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/3] add support for change_pte mmu notifiers
  2009-09-10 16:38   ` [PATCH 2/3] add SPTE_HOST_WRITEABLE flag to the shadow ptes Izik Eidus
@ 2009-09-10 16:38     ` Izik Eidus
  2009-09-11 21:24       ` Marcelo Tosatti
  2009-09-11 21:14     ` [PATCH 2/3] add SPTE_HOST_WRITEABLE flag to the shadow ptes Marcelo Tosatti
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Izik Eidus @ 2009-09-10 16:38 UTC (permalink / raw)
  To: avi; +Cc: kvm, aarcange, Izik Eidus

this is needed for kvm if it want ksm to directly map pages into its
shadow page tables.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/mmu.c              |   70 ++++++++++++++++++++++++++++++++++----
 virt/kvm/kvm_main.c             |   14 ++++++++
 3 files changed, 77 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6046e6f..594d131 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -797,6 +797,7 @@ asmlinkage void kvm_handle_fault_on_reboot(void);
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
 int kvm_age_hva(struct kvm *kvm, unsigned long hva);
+void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
 int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a7151b8..3fd19f2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -282,6 +282,11 @@ static pfn_t spte_to_pfn(u64 pte)
 	return (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
 }
 
+static pte_t ptep_val(pte_t *ptep)
+{
+	return *ptep;
+}
+
 static gfn_t pse36_gfn_delta(u32 gpte)
 {
 	int shift = 32 - PT32_DIR_PSE36_SHIFT - PAGE_SHIFT;
@@ -748,7 +753,8 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
 	return write_protected;
 }
 
-static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
+static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
+			   unsigned long data)
 {
 	u64 *spte;
 	int need_tlb_flush = 0;
@@ -763,8 +769,48 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
 	return need_tlb_flush;
 }
 
+static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
+			     unsigned long data)
+{
+	int need_flush = 0;
+	u64 *spte, new_spte;
+	pte_t *ptep = (pte_t *)data;
+	pfn_t new_pfn;
+
+	new_pfn = pte_pfn(ptep_val(ptep));
+	spte = rmap_next(kvm, rmapp, NULL);
+	while (spte) {
+		BUG_ON(!is_shadow_present_pte(*spte));
+		rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", spte, *spte);
+		need_flush = 1;
+		if (pte_write(ptep_val(ptep))) {
+			rmap_remove(kvm, spte);
+			__set_spte(spte, shadow_trap_nonpresent_pte);
+			spte = rmap_next(kvm, rmapp, NULL);
+		} else {
+			new_spte = *spte &~ (PT64_BASE_ADDR_MASK);
+			new_spte |= new_pfn << PAGE_SHIFT;
+
+			if (!pte_write(ptep_val(ptep))) {
+				new_spte &= ~PT_WRITABLE_MASK;
+				new_spte &= ~SPTE_HOST_WRITEABLE;
+				if (is_writeble_pte(*spte))
+					kvm_set_pfn_dirty(spte_to_pfn(*spte));
+			}
+			__set_spte(spte, new_spte);
+			spte = rmap_next(kvm, rmapp, spte);
+		}
+	}
+	if (need_flush)
+		kvm_flush_remote_tlbs(kvm);
+
+	return 0;
+}
+
 static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
-			  int (*handler)(struct kvm *kvm, unsigned long *rmapp))
+			  unsigned long data,
+			  int (*handler)(struct kvm *kvm, unsigned long *rmapp,
+					 unsigned long data))
 {
 	int i, j;
 	int retval = 0;
@@ -786,13 +832,15 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 		if (hva >= start && hva < end) {
 			gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
 
-			retval |= handler(kvm, &memslot->rmap[gfn_offset]);
+			retval |= handler(kvm, &memslot->rmap[gfn_offset],
+					  data);
 
 			for (j = 0; j < KVM_NR_PAGE_SIZES - 1; ++j) {
 				int idx = gfn_offset;
 				idx /= KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL + j);
 				retval |= handler(kvm,
-					&memslot->lpage_info[j][idx].rmap_pde);
+					&memslot->lpage_info[j][idx].rmap_pde,
+					data);
 			}
 		}
 	}
@@ -802,10 +850,16 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
 {
-	return kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
+	return kvm_handle_hva(kvm, hva, 0, kvm_unmap_rmapp);
+}
+
+void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
+{
+	kvm_handle_hva(kvm, hva, (unsigned long)&pte, kvm_set_pte_rmapp);
 }
 
-static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp)
+static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
+			 unsigned long data)
 {
 	u64 *spte;
 	int young = 0;
@@ -841,13 +895,13 @@ static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
 	gfn = unalias_gfn(vcpu->kvm, gfn);
 	rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level);
 
-	kvm_unmap_rmapp(vcpu->kvm, rmapp);
+	kvm_unmap_rmapp(vcpu->kvm, rmapp, 0);
 	kvm_flush_remote_tlbs(vcpu->kvm);
 }
 
 int kvm_age_hva(struct kvm *kvm, unsigned long hva)
 {
-	return kvm_handle_hva(kvm, hva, kvm_age_rmapp);
+	return kvm_handle_hva(kvm, hva, 0, kvm_age_rmapp);
 }
 
 #ifdef MMU_DEBUG
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index cb7966f..c9a88af 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -240,6 +240,19 @@ static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn,
 
 }
 
+static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
+					struct mm_struct *mm,
+					unsigned long address,
+					pte_t pte)
+{
+	struct kvm *kvm = mmu_notifier_to_kvm(mn);
+
+	spin_lock(&kvm->mmu_lock);
+	kvm->mmu_notifier_seq++;
+	kvm_set_spte_hva(kvm, address, pte);
+	spin_unlock(&kvm->mmu_lock);
+}
+
 static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 						    struct mm_struct *mm,
 						    unsigned long start,
@@ -319,6 +332,7 @@ static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
 	.invalidate_range_start	= kvm_mmu_notifier_invalidate_range_start,
 	.invalidate_range_end	= kvm_mmu_notifier_invalidate_range_end,
 	.clear_flush_young	= kvm_mmu_notifier_clear_flush_young,
+	.change_pte		= kvm_mmu_notifier_change_pte,
 	.release		= kvm_mmu_notifier_release,
 };
 #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] add SPTE_HOST_WRITEABLE flag to the shadow ptes
  2009-09-10 16:38   ` [PATCH 2/3] add SPTE_HOST_WRITEABLE flag to the shadow ptes Izik Eidus
  2009-09-10 16:38     ` [PATCH 3/3] add support for change_pte mmu notifiers Izik Eidus
@ 2009-09-11 21:14     ` Marcelo Tosatti
  2009-09-12  6:35       ` Izik Eidus
  2009-09-14  5:18     ` Avi Kivity
  2009-09-14  8:34     ` Marcelo Tosatti
  3 siblings, 1 reply; 24+ messages in thread
From: Marcelo Tosatti @ 2009-09-11 21:14 UTC (permalink / raw)
  To: Izik Eidus; +Cc: avi, kvm, aarcange

On Thu, Sep 10, 2009 at 07:38:57PM +0300, Izik Eidus wrote:
> this flag notify that the host physical page we are pointing to from
> the spte is write protected, and therefore we cant change its access
> to be write unless we run get_user_pages(write = 1).
> 
> (this is needed for change_pte support in kvm)
> 
> Signed-off-by: Izik Eidus <ieidus@redhat.com>
> ---
>  arch/x86/kvm/mmu.c         |   15 +++++++++++----
>  arch/x86/kvm/paging_tmpl.h |   18 +++++++++++++++---
>  2 files changed, 26 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 62d2f86..a7151b8 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -156,6 +156,8 @@ module_param(oos_shadow, bool, 0644);
>  #define CREATE_TRACE_POINTS
>  #include "mmutrace.h"
>  
> +#define SPTE_HOST_WRITEABLE (1ULL << PT_FIRST_AVAIL_BITS_SHIFT)
> +
>  #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
>  
>  struct kvm_rmap_desc {
> @@ -1754,7 +1756,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  		    unsigned pte_access, int user_fault,
>  		    int write_fault, int dirty, int level,
>  		    gfn_t gfn, pfn_t pfn, bool speculative,
> -		    bool can_unsync)
> +		    bool can_unsync, bool reset_host_protection)
				     bool host_pte_writeable ?

>  {
>  	u64 spte;
>  	int ret = 0;
> @@ -1781,6 +1783,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  		spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
>  			kvm_is_mmio_pfn(pfn));
>  
> +	if (reset_host_protection)
> +		spte |= SPTE_HOST_WRITEABLE;
> +
>  	spte |= (u64)pfn << PAGE_SHIFT;
>  
>  	if ((pte_access & ACC_WRITE_MASK)
> @@ -1826,7 +1831,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  			 unsigned pt_access, unsigned pte_access,
>  			 int user_fault, int write_fault, int dirty,
>  			 int *ptwrite, int level, gfn_t gfn,
> -			 pfn_t pfn, bool speculative)
> +			 pfn_t pfn, bool speculative,
> +			 bool reset_host_protection)
>  {
>  	int was_rmapped = 0;
>  	int was_writeble = is_writeble_pte(*sptep);
> @@ -1858,7 +1864,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  	}
>  
>  	if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault,
> -		      dirty, level, gfn, pfn, speculative, true)) {
> +		      dirty, level, gfn, pfn, speculative, true,
> +		      reset_host_protection)) {
>  		if (write_fault)
>  			*ptwrite = 1;
>  		kvm_x86_ops->tlb_flush(vcpu);
> @@ -1906,7 +1913,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
>  		if (iterator.level == level) {
>  			mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, ACC_ALL,
>  				     0, write, 1, &pt_write,
> -				     level, gfn, pfn, false);
> +				     level, gfn, pfn, false, true);
>  			++vcpu->stat.pf_fixed;
>  			break;
>  		}
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index d2fec9c..c9256ee 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -273,9 +273,13 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
>  	if (mmu_notifier_retry(vcpu, vcpu->arch.update_pte.mmu_seq))
>  		return;
>  	kvm_get_pfn(pfn);
> + 	/*
> + 	 * we call mmu_set_spte() with reset_host_protection = true beacuse that
> + 	 * vcpu->arch.update_pte.pfn was fetched from get_user_pages(write = 1).
> + 	 */
>  	mmu_set_spte(vcpu, spte, page->role.access, pte_access, 0, 0,
>  		     gpte & PT_DIRTY_MASK, NULL, PT_PAGE_TABLE_LEVEL,
> -		     gpte_to_gfn(gpte), pfn, true);
> +		     gpte_to_gfn(gpte), pfn, true, true);
>  }
>  
>  /*
> @@ -308,7 +312,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
>  				     user_fault, write_fault,
>  				     gw->ptes[gw->level-1] & PT_DIRTY_MASK,
>  				     ptwrite, level,
> -				     gw->gfn, pfn, false);
> +				     gw->gfn, pfn, false, true);
>  			break;
>  		}
>  
> @@ -558,6 +562,7 @@ static void FNAME(prefetch_page)(struct kvm_vcpu *vcpu,
>  static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>  {
>  	int i, offset, nr_present;
> +        bool reset_host_protection;

Tab damaged.

>  	offset = nr_present = 0;
>  
> @@ -595,9 +600,16 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>  
>  		nr_present++;
>  		pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
> +		if (!(sp->spt[i] & SPTE_HOST_WRITEABLE)) {
> + 			pte_access &= ~PT_WRITABLE_MASK;
> +			reset_host_protection = 0;
> +		} else {
> +			reset_host_protection = 1;
> +		}
>  		set_spte(vcpu, &sp->spt[i], pte_access, 0, 0,
>  			 is_dirty_gpte(gpte), PT_PAGE_TABLE_LEVEL, gfn,
> -			 spte_to_pfn(sp->spt[i]), true, false);
> +			 spte_to_pfn(sp->spt[i]), true, false,
> +			 reset_host_protection);
>  	}
>  
>  	return !nr_present;

Looks good otherwise.

> -- 
> 1.5.6.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] add support for change_pte mmu notifiers
  2009-09-10 16:38     ` [PATCH 3/3] add support for change_pte mmu notifiers Izik Eidus
@ 2009-09-11 21:24       ` Marcelo Tosatti
  2009-09-12  6:41         ` Izik Eidus
  0 siblings, 1 reply; 24+ messages in thread
From: Marcelo Tosatti @ 2009-09-11 21:24 UTC (permalink / raw)
  To: Izik Eidus; +Cc: avi, kvm, aarcange

On Thu, Sep 10, 2009 at 07:38:58PM +0300, Izik Eidus wrote:
> this is needed for kvm if it want ksm to directly map pages into its
> shadow page tables.
> 
> Signed-off-by: Izik Eidus <ieidus@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    1 +
>  arch/x86/kvm/mmu.c              |   70 ++++++++++++++++++++++++++++++++++----
>  virt/kvm/kvm_main.c             |   14 ++++++++
>  3 files changed, 77 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 6046e6f..594d131 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -797,6 +797,7 @@ asmlinkage void kvm_handle_fault_on_reboot(void);
>  #define KVM_ARCH_WANT_MMU_NOTIFIER
>  int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
>  int kvm_age_hva(struct kvm *kvm, unsigned long hva);
> +void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
>  int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
>  int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
>  int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index a7151b8..3fd19f2 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -282,6 +282,11 @@ static pfn_t spte_to_pfn(u64 pte)
>  	return (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
>  }
>  
> +static pte_t ptep_val(pte_t *ptep)
> +{
> +	return *ptep;
> +}
> +
>  static gfn_t pse36_gfn_delta(u32 gpte)
>  {
>  	int shift = 32 - PT32_DIR_PSE36_SHIFT - PAGE_SHIFT;
> @@ -748,7 +753,8 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
>  	return write_protected;
>  }
>  
> -static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
> +static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
> +			   unsigned long data)
>  {
>  	u64 *spte;
>  	int need_tlb_flush = 0;
> @@ -763,8 +769,48 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
>  	return need_tlb_flush;
>  }
>  
> +static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
> +			     unsigned long data)
> +{
> +	int need_flush = 0;
> +	u64 *spte, new_spte;
> +	pte_t *ptep = (pte_t *)data;
> +	pfn_t new_pfn;
> +
> +	new_pfn = pte_pfn(ptep_val(ptep));
> +	spte = rmap_next(kvm, rmapp, NULL);
> +	while (spte) {
> +		BUG_ON(!is_shadow_present_pte(*spte));
> +		rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", spte, *spte);
> +		need_flush = 1;
> +		if (pte_write(ptep_val(ptep))) {
> +			rmap_remove(kvm, spte);
> +			__set_spte(spte, shadow_trap_nonpresent_pte);
> +			spte = rmap_next(kvm, rmapp, NULL);
> +		} else {
> +			new_spte = *spte &~ (PT64_BASE_ADDR_MASK);
> +			new_spte |= new_pfn << PAGE_SHIFT;
> +
> +			if (!pte_write(ptep_val(ptep))) {
> +				new_spte &= ~PT_WRITABLE_MASK;
> +				new_spte &= ~SPTE_HOST_WRITEABLE;
> +				if (is_writeble_pte(*spte))
> +					kvm_set_pfn_dirty(spte_to_pfn(*spte));
> +			}
> +			__set_spte(spte, new_spte);
> +			spte = rmap_next(kvm, rmapp, spte);
> +		}
> +	}
> +	if (need_flush)
> +		kvm_flush_remote_tlbs(kvm);
> +
> +	return 0;
> +}
> +
>  static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
> -			  int (*handler)(struct kvm *kvm, unsigned long *rmapp))
> +			  unsigned long data,
> +			  int (*handler)(struct kvm *kvm, unsigned long *rmapp,
> +					 unsigned long data))
>  {
>  	int i, j;
>  	int retval = 0;
> @@ -786,13 +832,15 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
>  		if (hva >= start && hva < end) {
>  			gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
>  
> -			retval |= handler(kvm, &memslot->rmap[gfn_offset]);
> +			retval |= handler(kvm, &memslot->rmap[gfn_offset],
> +					  data);
>  
>  			for (j = 0; j < KVM_NR_PAGE_SIZES - 1; ++j) {
>  				int idx = gfn_offset;
>  				idx /= KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL + j);
>  				retval |= handler(kvm,
> -					&memslot->lpage_info[j][idx].rmap_pde);
> +					&memslot->lpage_info[j][idx].rmap_pde,
> +					data);

If change_pte is called to modify a largepage pte, and the shadow has
that largepage mapped with 4k sptes, you'll set the wrong pfn. That is,
the patch does not attempt to handle different page sizes properly.

So either disable change_pte updates to non-4k host vmas (making it
explictly it does not handle different pagesizes), or handle largepages
properly, although i don't see any use for change_pte or largepage
mappings?

>  			}
>  		}
>  	}
> @@ -802,10 +850,16 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
>  
>  int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
>  {
> -	return kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
> +	return kvm_handle_hva(kvm, hva, 0, kvm_unmap_rmapp);
> +}
> +
> +void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
> +{
> +	kvm_handle_hva(kvm, hva, (unsigned long)&pte, kvm_set_pte_rmapp);
>  }
>  
> -static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp)
> +static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
> +			 unsigned long data)
>  {
>  	u64 *spte;
>  	int young = 0;
> @@ -841,13 +895,13 @@ static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
>  	gfn = unalias_gfn(vcpu->kvm, gfn);
>  	rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level);
>  
> -	kvm_unmap_rmapp(vcpu->kvm, rmapp);
> +	kvm_unmap_rmapp(vcpu->kvm, rmapp, 0);
>  	kvm_flush_remote_tlbs(vcpu->kvm);
>  }
>  
>  int kvm_age_hva(struct kvm *kvm, unsigned long hva)
>  {
> -	return kvm_handle_hva(kvm, hva, kvm_age_rmapp);
> +	return kvm_handle_hva(kvm, hva, 0, kvm_age_rmapp);
>  }
>  
>  #ifdef MMU_DEBUG
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index cb7966f..c9a88af 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -240,6 +240,19 @@ static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn,
>  
>  }
>  
> +static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
> +					struct mm_struct *mm,
> +					unsigned long address,
> +					pte_t pte)
> +{
> +	struct kvm *kvm = mmu_notifier_to_kvm(mn);
> +
> +	spin_lock(&kvm->mmu_lock);
> +	kvm->mmu_notifier_seq++;
> +	kvm_set_spte_hva(kvm, address, pte);
> +	spin_unlock(&kvm->mmu_lock);
> +}
> +
>  static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
>  						    struct mm_struct *mm,
>  						    unsigned long start,
> @@ -319,6 +332,7 @@ static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
>  	.invalidate_range_start	= kvm_mmu_notifier_invalidate_range_start,
>  	.invalidate_range_end	= kvm_mmu_notifier_invalidate_range_end,
>  	.clear_flush_young	= kvm_mmu_notifier_clear_flush_young,
> +	.change_pte		= kvm_mmu_notifier_change_pte,
>  	.release		= kvm_mmu_notifier_release,
>  };
>  #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
> -- 
> 1.5.6.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] add SPTE_HOST_WRITEABLE flag to the shadow ptes
  2009-09-11 21:14     ` [PATCH 2/3] add SPTE_HOST_WRITEABLE flag to the shadow ptes Marcelo Tosatti
@ 2009-09-12  6:35       ` Izik Eidus
  0 siblings, 0 replies; 24+ messages in thread
From: Izik Eidus @ 2009-09-12  6:35 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: avi, kvm, aarcange

Marcelo Tosatti wrote:
> On Thu, Sep 10, 2009 at 07:38:57PM +0300, Izik Eidus wrote:
>   
>> this flag notify that the host physical page we are pointing to from
>> the spte is write protected, and therefore we cant change its access
>> to be write unless we run get_user_pages(write = 1).
>>
>> (this is needed for change_pte support in kvm)
>>
>> Signed-off-by: Izik Eidus <ieidus@redhat.com>
>> ---
>>  arch/x86/kvm/mmu.c         |   15 +++++++++++----
>>  arch/x86/kvm/paging_tmpl.h |   18 +++++++++++++++---
>>  2 files changed, 26 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 62d2f86..a7151b8 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -156,6 +156,8 @@ module_param(oos_shadow, bool, 0644);
>>  #define CREATE_TRACE_POINTS
>>  #include "mmutrace.h"
>>  
>> +#define SPTE_HOST_WRITEABLE (1ULL << PT_FIRST_AVAIL_BITS_SHIFT)
>> +
>>  #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
>>  
>>  struct kvm_rmap_desc {
>> @@ -1754,7 +1756,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>>  		    unsigned pte_access, int user_fault,
>>  		    int write_fault, int dirty, int level,
>>  		    gfn_t gfn, pfn_t pfn, bool speculative,
>> -		    bool can_unsync)
>> +		    bool can_unsync, bool reset_host_protection)
>>     
> 				     bool host_pte_writeable ?
>
>   
>
Sure.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] add support for change_pte mmu notifiers
  2009-09-11 21:24       ` Marcelo Tosatti
@ 2009-09-12  6:41         ` Izik Eidus
  2009-09-12 16:18           ` Marcelo Tosatti
  0 siblings, 1 reply; 24+ messages in thread
From: Izik Eidus @ 2009-09-12  6:41 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: avi, kvm, aarcange

Marcelo Tosatti wrote:
> On Thu, Sep 10, 2009 at 07:38:58PM +0300, Izik Eidus wrote:
>   
>> this is needed for kvm if it want ksm to directly map pages into its
>> shadow page tables.
>>
>> Signed-off-by: Izik Eidus <ieidus@redhat.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h |    1 +
>>  arch/x86/kvm/mmu.c              |   70 ++++++++++++++++++++++++++++++++++----
>>  virt/kvm/kvm_main.c             |   14 ++++++++
>>  3 files changed, 77 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 6046e6f..594d131 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -797,6 +797,7 @@ asmlinkage void kvm_handle_fault_on_reboot(void);
>>  #define KVM_ARCH_WANT_MMU_NOTIFIER
>>  int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
>>  int kvm_age_hva(struct kvm *kvm, unsigned long hva);
>> +void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
>>  int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
>>  int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
>>  int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index a7151b8..3fd19f2 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -282,6 +282,11 @@ static pfn_t spte_to_pfn(u64 pte)
>>  	return (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
>>  }
>>  
>> +static pte_t ptep_val(pte_t *ptep)
>> +{
>> +	return *ptep;
>> +}
>> +
>>  static gfn_t pse36_gfn_delta(u32 gpte)
>>  {
>>  	int shift = 32 - PT32_DIR_PSE36_SHIFT - PAGE_SHIFT;
>> @@ -748,7 +753,8 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
>>  	return write_protected;
>>  }
>>  
>> -static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
>> +static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
>> +			   unsigned long data)
>>  {
>>  	u64 *spte;
>>  	int need_tlb_flush = 0;
>> @@ -763,8 +769,48 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
>>  	return need_tlb_flush;
>>  }
>>  
>> +static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
>> +			     unsigned long data)
>> +{
>> +	int need_flush = 0;
>> +	u64 *spte, new_spte;
>> +	pte_t *ptep = (pte_t *)data;
>> +	pfn_t new_pfn;
>> +
>> +	new_pfn = pte_pfn(ptep_val(ptep));
>> +	spte = rmap_next(kvm, rmapp, NULL);
>> +	while (spte) {
>> +		BUG_ON(!is_shadow_present_pte(*spte));
>> +		rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", spte, *spte);
>> +		need_flush = 1;
>> +		if (pte_write(ptep_val(ptep))) {
>> +			rmap_remove(kvm, spte);
>> +			__set_spte(spte, shadow_trap_nonpresent_pte);
>> +			spte = rmap_next(kvm, rmapp, NULL);
>> +		} else {
>> +			new_spte = *spte &~ (PT64_BASE_ADDR_MASK);
>> +			new_spte |= new_pfn << PAGE_SHIFT;
>> +
>> +			if (!pte_write(ptep_val(ptep))) {
>> +				new_spte &= ~PT_WRITABLE_MASK;
>> +				new_spte &= ~SPTE_HOST_WRITEABLE;
>> +				if (is_writeble_pte(*spte))
>> +					kvm_set_pfn_dirty(spte_to_pfn(*spte));
>> +			}
>> +			__set_spte(spte, new_spte);
>> +			spte = rmap_next(kvm, rmapp, spte);
>> +		}
>> +	}
>> +	if (need_flush)
>> +		kvm_flush_remote_tlbs(kvm);
>> +
>> +	return 0;
>> +}
>> +
>>  static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
>> -			  int (*handler)(struct kvm *kvm, unsigned long *rmapp))
>> +			  unsigned long data,
>> +			  int (*handler)(struct kvm *kvm, unsigned long *rmapp,
>> +					 unsigned long data))
>>  {
>>  	int i, j;
>>  	int retval = 0;
>> @@ -786,13 +832,15 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
>>  		if (hva >= start && hva < end) {
>>  			gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
>>  
>> -			retval |= handler(kvm, &memslot->rmap[gfn_offset]);
>> +			retval |= handler(kvm, &memslot->rmap[gfn_offset],
>> +					  data);
>>  
>>  			for (j = 0; j < KVM_NR_PAGE_SIZES - 1; ++j) {
>>  				int idx = gfn_offset;
>>  				idx /= KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL + j);
>>  				retval |= handler(kvm,
>> -					&memslot->lpage_info[j][idx].rmap_pde);
>> +					&memslot->lpage_info[j][idx].rmap_pde,
>> +					data);
>>     
>
> If change_pte is called to modify a largepage pte, and the shadow has
> that largepage mapped with 4k sptes, you'll set the wrong pfn. That is,
> the patch does not attempt to handle different page sizes properly.
>
> So either disable change_pte updates to non-4k host vmas (making it
> explictly it does not handle different pagesizes), or handle largepages
> properly, although i don't see any use for change_pte or largepage
> mappings?
>   

change_pte doesn't get called on 4k pages...
So adding commet to this function saying it is working just on 4k pages 
would be enough ?



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] add support for change_pte mmu notifiers
  2009-09-12  6:41         ` Izik Eidus
@ 2009-09-12 16:18           ` Marcelo Tosatti
  2009-09-12 17:04             ` Izik Eidus
  0 siblings, 1 reply; 24+ messages in thread
From: Marcelo Tosatti @ 2009-09-12 16:18 UTC (permalink / raw)
  To: Izik Eidus; +Cc: avi, kvm, aarcange

On Sat, Sep 12, 2009 at 09:41:10AM +0300, Izik Eidus wrote:
> Marcelo Tosatti wrote:
>> On Thu, Sep 10, 2009 at 07:38:58PM +0300, Izik Eidus wrote:
>>   
>>> this is needed for kvm if it want ksm to directly map pages into its
>>> shadow page tables.
>>>
>>> Signed-off-by: Izik Eidus <ieidus@redhat.com>
>>> ---
>>>  arch/x86/include/asm/kvm_host.h |    1 +
>>>  arch/x86/kvm/mmu.c              |   70 ++++++++++++++++++++++++++++++++++----
>>>  virt/kvm/kvm_main.c             |   14 ++++++++
>>>  3 files changed, 77 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>> index 6046e6f..594d131 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -797,6 +797,7 @@ asmlinkage void kvm_handle_fault_on_reboot(void);
>>>  #define KVM_ARCH_WANT_MMU_NOTIFIER
>>>  int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
>>>  int kvm_age_hva(struct kvm *kvm, unsigned long hva);
>>> +void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
>>>  int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
>>>  int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
>>>  int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>>> index a7151b8..3fd19f2 100644
>>> --- a/arch/x86/kvm/mmu.c
>>> +++ b/arch/x86/kvm/mmu.c
>>> @@ -282,6 +282,11 @@ static pfn_t spte_to_pfn(u64 pte)
>>>  	return (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
>>>  }
>>>  +static pte_t ptep_val(pte_t *ptep)
>>> +{
>>> +	return *ptep;
>>> +}
>>> +
>>>  static gfn_t pse36_gfn_delta(u32 gpte)
>>>  {
>>>  	int shift = 32 - PT32_DIR_PSE36_SHIFT - PAGE_SHIFT;
>>> @@ -748,7 +753,8 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
>>>  	return write_protected;
>>>  }
>>>  -static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
>>> +static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
>>> +			   unsigned long data)
>>>  {
>>>  	u64 *spte;
>>>  	int need_tlb_flush = 0;
>>> @@ -763,8 +769,48 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
>>>  	return need_tlb_flush;
>>>  }
>>>  +static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
>>> +			     unsigned long data)
>>> +{
>>> +	int need_flush = 0;
>>> +	u64 *spte, new_spte;
>>> +	pte_t *ptep = (pte_t *)data;
>>> +	pfn_t new_pfn;
>>> +
>>> +	new_pfn = pte_pfn(ptep_val(ptep));
>>> +	spte = rmap_next(kvm, rmapp, NULL);
>>> +	while (spte) {
>>> +		BUG_ON(!is_shadow_present_pte(*spte));
>>> +		rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", spte, *spte);
>>> +		need_flush = 1;
>>> +		if (pte_write(ptep_val(ptep))) {
>>> +			rmap_remove(kvm, spte);
>>> +			__set_spte(spte, shadow_trap_nonpresent_pte);
>>> +			spte = rmap_next(kvm, rmapp, NULL);
>>> +		} else {
>>> +			new_spte = *spte &~ (PT64_BASE_ADDR_MASK);
>>> +			new_spte |= new_pfn << PAGE_SHIFT;
>>> +
>>> +			if (!pte_write(ptep_val(ptep))) {
>>> +				new_spte &= ~PT_WRITABLE_MASK;
>>> +				new_spte &= ~SPTE_HOST_WRITEABLE;
>>> +				if (is_writeble_pte(*spte))
>>> +					kvm_set_pfn_dirty(spte_to_pfn(*spte));
>>> +			}
>>> +			__set_spte(spte, new_spte);
>>> +			spte = rmap_next(kvm, rmapp, spte);
>>> +		}
>>> +	}
>>> +	if (need_flush)
>>> +		kvm_flush_remote_tlbs(kvm);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>  static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
>>> -			  int (*handler)(struct kvm *kvm, unsigned long *rmapp))
>>> +			  unsigned long data,
>>> +			  int (*handler)(struct kvm *kvm, unsigned long *rmapp,
>>> +					 unsigned long data))
>>>  {
>>>  	int i, j;
>>>  	int retval = 0;
>>> @@ -786,13 +832,15 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
>>>  		if (hva >= start && hva < end) {
>>>  			gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
>>>  -			retval |= handler(kvm, &memslot->rmap[gfn_offset]);
>>> +			retval |= handler(kvm, &memslot->rmap[gfn_offset],
>>> +					  data);
>>>   			for (j = 0; j < KVM_NR_PAGE_SIZES - 1; ++j) {
>>>  				int idx = gfn_offset;
>>>  				idx /= KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL + j);
>>>  				retval |= handler(kvm,
>>> -					&memslot->lpage_info[j][idx].rmap_pde);
>>> +					&memslot->lpage_info[j][idx].rmap_pde,
>>> +					data);
>>>     
>>
>> If change_pte is called to modify a largepage pte, and the shadow has
>> that largepage mapped with 4k sptes, you'll set the wrong pfn. That is,
>> the patch does not attempt to handle different page sizes properly.
>>
>> So either disable change_pte updates to non-4k host vmas (making it
>> explictly it does not handle different pagesizes), or handle largepages
>> properly, although i don't see any use for change_pte or largepage
>> mappings?
>>   
>
> change_pte doesn't get called on 4k pages...
> So adding commet to this function saying it is working just on 4k pages  
> would be enough ?

It would be better to fail/WARN on non-4k host ptes (can't it?), but if
that is not possible i think a comment would be enough.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] add support for change_pte mmu notifiers
  2009-09-12 17:04             ` Izik Eidus
@ 2009-09-12 16:59               ` Marcelo Tosatti
  0 siblings, 0 replies; 24+ messages in thread
From: Marcelo Tosatti @ 2009-09-12 16:59 UTC (permalink / raw)
  To: Izik Eidus; +Cc: avi, kvm, aarcange

On Sat, Sep 12, 2009 at 08:04:31PM +0300, Izik Eidus wrote:
>>>> If change_pte is called to modify a largepage pte, and the shadow has
>>>> that largepage mapped with 4k sptes, you'll set the wrong pfn. That is,
>>>> the patch does not attempt to handle different page sizes properly.
>>>>
>>>> So either disable change_pte updates to non-4k host vmas (making it
>>>> explictly it does not handle different pagesizes), or handle largepages
>>>> properly, although i don't see any use for change_pte or largepage
>>>> mappings?
>>>>         
>>> change_pte doesn't get called on 4k pages...
>>> So adding commet to this function saying it is working just on 4k 
>>> pages  would be enough ?
>>>     
>>
>> It would be better to fail/WARN on non-4k host ptes (can't it?), but if
>> that is not possible i think a comment would be enough.
>>
>>   
> Ok so you want is_large_pte() check on data right? (and WARN() there..)

pte_huge(pte), yeah.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] add support for change_pte mmu notifiers
  2009-09-12 16:18           ` Marcelo Tosatti
@ 2009-09-12 17:04             ` Izik Eidus
  2009-09-12 16:59               ` Marcelo Tosatti
  0 siblings, 1 reply; 24+ messages in thread
From: Izik Eidus @ 2009-09-12 17:04 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: avi, kvm, aarcange

Marcelo Tosatti wrote:
> On Sat, Sep 12, 2009 at 09:41:10AM +0300, Izik Eidus wrote:
>   
>> Marcelo Tosatti wrote:
>>     
>>> On Thu, Sep 10, 2009 at 07:38:58PM +0300, Izik Eidus wrote:
>>>   
>>>       
>>>> this is needed for kvm if it want ksm to directly map pages into its
>>>> shadow page tables.
>>>>
>>>> Signed-off-by: Izik Eidus <ieidus@redhat.com>
>>>> ---
>>>>  arch/x86/include/asm/kvm_host.h |    1 +
>>>>  arch/x86/kvm/mmu.c              |   70 ++++++++++++++++++++++++++++++++++----
>>>>  virt/kvm/kvm_main.c             |   14 ++++++++
>>>>  3 files changed, 77 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>> index 6046e6f..594d131 100644
>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>> @@ -797,6 +797,7 @@ asmlinkage void kvm_handle_fault_on_reboot(void);
>>>>  #define KVM_ARCH_WANT_MMU_NOTIFIER
>>>>  int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
>>>>  int kvm_age_hva(struct kvm *kvm, unsigned long hva);
>>>> +void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
>>>>  int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
>>>>  int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
>>>>  int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
>>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>>>> index a7151b8..3fd19f2 100644
>>>> --- a/arch/x86/kvm/mmu.c
>>>> +++ b/arch/x86/kvm/mmu.c
>>>> @@ -282,6 +282,11 @@ static pfn_t spte_to_pfn(u64 pte)
>>>>  	return (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
>>>>  }
>>>>  +static pte_t ptep_val(pte_t *ptep)
>>>> +{
>>>> +	return *ptep;
>>>> +}
>>>> +
>>>>  static gfn_t pse36_gfn_delta(u32 gpte)
>>>>  {
>>>>  	int shift = 32 - PT32_DIR_PSE36_SHIFT - PAGE_SHIFT;
>>>> @@ -748,7 +753,8 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
>>>>  	return write_protected;
>>>>  }
>>>>  -static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
>>>> +static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
>>>> +			   unsigned long data)
>>>>  {
>>>>  	u64 *spte;
>>>>  	int need_tlb_flush = 0;
>>>> @@ -763,8 +769,48 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
>>>>  	return need_tlb_flush;
>>>>  }
>>>>  +static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
>>>> +			     unsigned long data)
>>>> +{
>>>> +	int need_flush = 0;
>>>> +	u64 *spte, new_spte;
>>>> +	pte_t *ptep = (pte_t *)data;
>>>> +	pfn_t new_pfn;
>>>> +
>>>> +	new_pfn = pte_pfn(ptep_val(ptep));
>>>> +	spte = rmap_next(kvm, rmapp, NULL);
>>>> +	while (spte) {
>>>> +		BUG_ON(!is_shadow_present_pte(*spte));
>>>> +		rmap_printk("kvm_set_pte_rmapp: spte %p %llx\n", spte, *spte);
>>>> +		need_flush = 1;
>>>> +		if (pte_write(ptep_val(ptep))) {
>>>> +			rmap_remove(kvm, spte);
>>>> +			__set_spte(spte, shadow_trap_nonpresent_pte);
>>>> +			spte = rmap_next(kvm, rmapp, NULL);
>>>> +		} else {
>>>> +			new_spte = *spte &~ (PT64_BASE_ADDR_MASK);
>>>> +			new_spte |= new_pfn << PAGE_SHIFT;
>>>> +
>>>> +			if (!pte_write(ptep_val(ptep))) {
>>>> +				new_spte &= ~PT_WRITABLE_MASK;
>>>> +				new_spte &= ~SPTE_HOST_WRITEABLE;
>>>> +				if (is_writeble_pte(*spte))
>>>> +					kvm_set_pfn_dirty(spte_to_pfn(*spte));
>>>> +			}
>>>> +			__set_spte(spte, new_spte);
>>>> +			spte = rmap_next(kvm, rmapp, spte);
>>>> +		}
>>>> +	}
>>>> +	if (need_flush)
>>>> +		kvm_flush_remote_tlbs(kvm);
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>>  static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
>>>> -			  int (*handler)(struct kvm *kvm, unsigned long *rmapp))
>>>> +			  unsigned long data,
>>>> +			  int (*handler)(struct kvm *kvm, unsigned long *rmapp,
>>>> +					 unsigned long data))
>>>>  {
>>>>  	int i, j;
>>>>  	int retval = 0;
>>>> @@ -786,13 +832,15 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
>>>>  		if (hva >= start && hva < end) {
>>>>  			gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
>>>>  -			retval |= handler(kvm, &memslot->rmap[gfn_offset]);
>>>> +			retval |= handler(kvm, &memslot->rmap[gfn_offset],
>>>> +					  data);
>>>>   			for (j = 0; j < KVM_NR_PAGE_SIZES - 1; ++j) {
>>>>  				int idx = gfn_offset;
>>>>  				idx /= KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL + j);
>>>>  				retval |= handler(kvm,
>>>> -					&memslot->lpage_info[j][idx].rmap_pde);
>>>> +					&memslot->lpage_info[j][idx].rmap_pde,
>>>> +					data);
>>>>     
>>>>         
>>> If change_pte is called to modify a largepage pte, and the shadow has
>>> that largepage mapped with 4k sptes, you'll set the wrong pfn. That is,
>>> the patch does not attempt to handle different page sizes properly.
>>>
>>> So either disable change_pte updates to non-4k host vmas (making it
>>> explictly it does not handle different pagesizes), or handle largepages
>>> properly, although i don't see any use for change_pte or largepage
>>> mappings?
>>>   
>>>       
>> change_pte doesn't get called on 4k pages...
>> So adding commet to this function saying it is working just on 4k pages  
>> would be enough ?
>>     
>
> It would be better to fail/WARN on non-4k host ptes (can't it?), but if
> that is not possible i think a comment would be enough.
>
>   
Ok so you want is_large_pte() check on data right? (and WARN() there..)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] add SPTE_HOST_WRITEABLE flag to the shadow ptes
  2009-09-10 16:38   ` [PATCH 2/3] add SPTE_HOST_WRITEABLE flag to the shadow ptes Izik Eidus
  2009-09-10 16:38     ` [PATCH 3/3] add support for change_pte mmu notifiers Izik Eidus
  2009-09-11 21:14     ` [PATCH 2/3] add SPTE_HOST_WRITEABLE flag to the shadow ptes Marcelo Tosatti
@ 2009-09-14  5:18     ` Avi Kivity
  2009-09-14  8:34     ` Marcelo Tosatti
  3 siblings, 0 replies; 24+ messages in thread
From: Avi Kivity @ 2009-09-14  5:18 UTC (permalink / raw)
  To: Izik Eidus; +Cc: kvm, aarcange

On 09/10/2009 07:38 PM, Izik Eidus wrote:
> this flag notify that the host physical page we are pointing to from
> the spte is write protected, and therefore we cant change its access
> to be write unless we run get_user_pages(write = 1).
>
> (this is needed for change_pte support in kvm)
>
>
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 62d2f86..a7151b8 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -156,6 +156,8 @@ module_param(oos_shadow, bool, 0644);
>   #define CREATE_TRACE_POINTS
>   #include "mmutrace.h"
>
> +#define SPTE_HOST_WRITEABLE (1ULL<<  PT_FIRST_AVAIL_BITS_SHIFT)
> +
>    

Luckilly, this bit is available on EPT too.

>   #define SHADOW_PT_INDEX(addr, level) PT64_I
> @@ -595,9 +600,16 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>
>   		nr_present++;
>   		pte_access = sp->role.access&  FNAME(gpte_access)(vcpu, gpte);
> +		if (!(sp->spt[i]&  SPTE_HOST_WRITEABLE)) {
> + 			pte_access&= ~PT_WRITABLE_MASK;
>    

pte_access uses ACC_ masks, not PT_ masks.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] add SPTE_HOST_WRITEABLE flag to the shadow ptes
  2009-09-10 16:38   ` [PATCH 2/3] add SPTE_HOST_WRITEABLE flag to the shadow ptes Izik Eidus
                       ` (2 preceding siblings ...)
  2009-09-14  5:18     ` Avi Kivity
@ 2009-09-14  8:34     ` Marcelo Tosatti
  2009-09-14 16:51       ` Izik Eidus
  3 siblings, 1 reply; 24+ messages in thread
From: Marcelo Tosatti @ 2009-09-14  8:34 UTC (permalink / raw)
  To: Izik Eidus; +Cc: avi, kvm, aarcange

On Thu, Sep 10, 2009 at 07:38:57PM +0300, Izik Eidus wrote:
> this flag notify that the host physical page we are pointing to from
> the spte is write protected, and therefore we cant change its access
> to be write unless we run get_user_pages(write = 1).
> 
> (this is needed for change_pte support in kvm)
> 
> Signed-off-by: Izik Eidus <ieidus@redhat.com>
> ---
>  arch/x86/kvm/mmu.c         |   15 +++++++++++----
>  arch/x86/kvm/paging_tmpl.h |   18 +++++++++++++++---
>  2 files changed, 26 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 62d2f86..a7151b8 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -156,6 +156,8 @@ module_param(oos_shadow, bool, 0644);
>  #define CREATE_TRACE_POINTS
>  #include "mmutrace.h"
>  
> +#define SPTE_HOST_WRITEABLE (1ULL << PT_FIRST_AVAIL_BITS_SHIFT)
> +
>  #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
>  
>  struct kvm_rmap_desc {
> @@ -1754,7 +1756,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  		    unsigned pte_access, int user_fault,
>  		    int write_fault, int dirty, int level,
>  		    gfn_t gfn, pfn_t pfn, bool speculative,
> -		    bool can_unsync)
> +		    bool can_unsync, bool reset_host_protection)
>  {
>  	u64 spte;
>  	int ret = 0;
> @@ -1781,6 +1783,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  		spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
>  			kvm_is_mmio_pfn(pfn));
>  
> +	if (reset_host_protection)
> +		spte |= SPTE_HOST_WRITEABLE;
> +
>  	spte |= (u64)pfn << PAGE_SHIFT;
>  
>  	if ((pte_access & ACC_WRITE_MASK)
> @@ -1826,7 +1831,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  			 unsigned pt_access, unsigned pte_access,
>  			 int user_fault, int write_fault, int dirty,
>  			 int *ptwrite, int level, gfn_t gfn,
> -			 pfn_t pfn, bool speculative)
> +			 pfn_t pfn, bool speculative,
> +			 bool reset_host_protection)
>  {
>  	int was_rmapped = 0;
>  	int was_writeble = is_writeble_pte(*sptep);
> @@ -1858,7 +1864,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  	}
>  
>  	if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault,
> -		      dirty, level, gfn, pfn, speculative, true)) {
> +		      dirty, level, gfn, pfn, speculative, true,
> +		      reset_host_protection)) {
>  		if (write_fault)
>  			*ptwrite = 1;
>  		kvm_x86_ops->tlb_flush(vcpu);
> @@ -1906,7 +1913,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
>  		if (iterator.level == level) {
>  			mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, ACC_ALL,
>  				     0, write, 1, &pt_write,
> -				     level, gfn, pfn, false);
> +				     level, gfn, pfn, false, true);
>  			++vcpu->stat.pf_fixed;
>  			break;
>  		}
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index d2fec9c..c9256ee 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -273,9 +273,13 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
>  	if (mmu_notifier_retry(vcpu, vcpu->arch.update_pte.mmu_seq))
>  		return;
>  	kvm_get_pfn(pfn);
> + 	/*
> + 	 * we call mmu_set_spte() with reset_host_protection = true beacuse that
> + 	 * vcpu->arch.update_pte.pfn was fetched from get_user_pages(write = 1).
> + 	 */
>  	mmu_set_spte(vcpu, spte, page->role.access, pte_access, 0, 0,
>  		     gpte & PT_DIRTY_MASK, NULL, PT_PAGE_TABLE_LEVEL,
> -		     gpte_to_gfn(gpte), pfn, true);
> +		     gpte_to_gfn(gpte), pfn, true, true);
>  }
>  
>  /*
> @@ -308,7 +312,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
>  				     user_fault, write_fault,
>  				     gw->ptes[gw->level-1] & PT_DIRTY_MASK,
>  				     ptwrite, level,
> -				     gw->gfn, pfn, false);
> +				     gw->gfn, pfn, false, true);
>  			break;
>  		}
>  
> @@ -558,6 +562,7 @@ static void FNAME(prefetch_page)(struct kvm_vcpu *vcpu,
>  static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>  {
>  	int i, offset, nr_present;
> +        bool reset_host_protection;
>  
>  	offset = nr_present = 0;
>  
> @@ -595,9 +600,16 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
>  
>  		nr_present++;
>  		pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
> +		if (!(sp->spt[i] & SPTE_HOST_WRITEABLE)) {
> + 			pte_access &= ~PT_WRITABLE_MASK;
> +			reset_host_protection = 0;
> +		} else {
> +			reset_host_protection = 1;
> +		}
>  		set_spte(vcpu, &sp->spt[i], pte_access, 0, 0,
>  			 is_dirty_gpte(gpte), PT_PAGE_TABLE_LEVEL, gfn,
> -			 spte_to_pfn(sp->spt[i]), true, false);
> +			 spte_to_pfn(sp->spt[i]), true, false,
> +			 reset_host_protection);

Why can't you use the writable bit in the spte? So that you can only 
sync a writeable spte if it was writeable before, in sync_page?

Is there any other need for the extra bit?

>  	}
>  
>  	return !nr_present;
> -- 
> 1.5.6.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] add SPTE_HOST_WRITEABLE flag to the shadow ptes
  2009-09-14  8:34     ` Marcelo Tosatti
@ 2009-09-14 16:51       ` Izik Eidus
  0 siblings, 0 replies; 24+ messages in thread
From: Izik Eidus @ 2009-09-14 16:51 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: avi, kvm, aarcange

Marcelo Tosatti wrote:
> Why can't you use the writable bit in the spte? So that you can only 
> sync a writeable spte if it was writeable before, in sync_page?
>   

I could, but there we will add overhead for read only gptes that become 
writable in the guest...
If you prefer to fault on the syncing of the guest gpte readonly to gpte 
writeable I can change it...

What you prefer?

> Is there any other need for the extra bit?
>   

No



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages
  2009-09-23 18:47 ` [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages Izik Eidus
  2009-09-24 14:18   ` Marcelo Tosatti
@ 2009-09-24 16:56   ` Andrea Arcangeli
  1 sibling, 0 replies; 24+ messages in thread
From: Andrea Arcangeli @ 2009-09-24 16:56 UTC (permalink / raw)
  To: Izik Eidus; +Cc: avi, kvm, mtosatti

On Wed, Sep 23, 2009 at 09:47:16PM +0300, Izik Eidus wrote:
> When using mmu notifiers, we are allowed to remove the page count
> reference tooken by get_user_pages to a specific page that is mapped
> inside the shadow page tables.
> 
> This is needed so we can balance the pagecount against mapcount
> checking.
> 
> (Right now kvm increase the pagecount and does not increase the
> mapcount when mapping page into shadow page table entry,
> so when comparing pagecount against mapcount, you have no
> reliable result.)
> 
> Signed-off-by: Izik Eidus <ieidus@redhat.com>

Acked-by: Andrea Arcangeli <aarcange@redhat.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages
  2009-09-23 18:47 ` [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages Izik Eidus
@ 2009-09-24 14:18   ` Marcelo Tosatti
  2009-09-24 16:56   ` Andrea Arcangeli
  1 sibling, 0 replies; 24+ messages in thread
From: Marcelo Tosatti @ 2009-09-24 14:18 UTC (permalink / raw)
  To: Izik Eidus; +Cc: avi, kvm, aarcange, Jan Kiszka


This needs compat code for !MMU_NOTIFIERS case in kvm-kmod (Jan CC'ed).

Otherwise looks good.

On Wed, Sep 23, 2009 at 09:47:16PM +0300, Izik Eidus wrote:
> When using mmu notifiers, we are allowed to remove the page count
> reference tooken by get_user_pages to a specific page that is mapped
> inside the shadow page tables.
> 
> This is needed so we can balance the pagecount against mapcount
> checking.
> 
> (Right now kvm increase the pagecount and does not increase the
> mapcount when mapping page into shadow page table entry,
> so when comparing pagecount against mapcount, you have no
> reliable result.)
> 
> Signed-off-by: Izik Eidus <ieidus@redhat.com>
> ---
>  arch/x86/kvm/mmu.c |    7 ++-----
>  1 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index eca41ae..6c67b23 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -634,9 +634,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
>  	if (*spte & shadow_accessed_mask)
>  		kvm_set_pfn_accessed(pfn);
>  	if (is_writeble_pte(*spte))
> -		kvm_release_pfn_dirty(pfn);
> -	else
> -		kvm_release_pfn_clean(pfn);
> +		kvm_set_pfn_dirty(pfn);
>  	rmapp = gfn_to_rmap(kvm, sp->gfns[spte - sp->spt], sp->role.level);
>  	if (!*rmapp) {
>  		printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte);
> @@ -1877,8 +1875,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  	page_header_update_slot(vcpu->kvm, sptep, gfn);
>  	if (!was_rmapped) {
>  		rmap_count = rmap_add(vcpu, sptep, gfn);
> -		if (!is_rmap_spte(*sptep))
> -			kvm_release_pfn_clean(pfn);
> +		kvm_release_pfn_clean(pfn);
>  		if (rmap_count > RMAP_RECYCLE_THRESHOLD)
>  			rmap_recycle(vcpu, sptep, gfn);
>  	} else {
> -- 
> 1.5.6.5

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages
  2009-09-23 18:47 [PATCH 0/3] kvm ksm support v3 Izik Eidus
@ 2009-09-23 18:47 ` Izik Eidus
  2009-09-24 14:18   ` Marcelo Tosatti
  2009-09-24 16:56   ` Andrea Arcangeli
  0 siblings, 2 replies; 24+ messages in thread
From: Izik Eidus @ 2009-09-23 18:47 UTC (permalink / raw)
  To: avi; +Cc: kvm, aarcange, mtosatti, Izik Eidus

When using mmu notifiers, we are allowed to remove the page count
reference tooken by get_user_pages to a specific page that is mapped
inside the shadow page tables.

This is needed so we can balance the pagecount against mapcount
checking.

(Right now kvm increase the pagecount and does not increase the
mapcount when mapping page into shadow page table entry,
so when comparing pagecount against mapcount, you have no
reliable result.)

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 arch/x86/kvm/mmu.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index eca41ae..6c67b23 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -634,9 +634,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
 	if (*spte & shadow_accessed_mask)
 		kvm_set_pfn_accessed(pfn);
 	if (is_writeble_pte(*spte))
-		kvm_release_pfn_dirty(pfn);
-	else
-		kvm_release_pfn_clean(pfn);
+		kvm_set_pfn_dirty(pfn);
 	rmapp = gfn_to_rmap(kvm, sp->gfns[spte - sp->spt], sp->role.level);
 	if (!*rmapp) {
 		printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte);
@@ -1877,8 +1875,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	page_header_update_slot(vcpu->kvm, sptep, gfn);
 	if (!was_rmapped) {
 		rmap_count = rmap_add(vcpu, sptep, gfn);
-		if (!is_rmap_spte(*sptep))
-			kvm_release_pfn_clean(pfn);
+		kvm_release_pfn_clean(pfn);
 		if (rmap_count > RMAP_RECYCLE_THRESHOLD)
 			rmap_recycle(vcpu, sptep, gfn);
 	} else {
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages
  2009-09-23 18:25 [PATCH 0/3] ksm support for kvm v2 Izik Eidus
@ 2009-09-23 18:25 ` Izik Eidus
  0 siblings, 0 replies; 24+ messages in thread
From: Izik Eidus @ 2009-09-23 18:25 UTC (permalink / raw)
  To: avi; +Cc: kvm, aarcange, mtosatti, Izik Eidus

When using mmu notifiers, we are allowed to remove the page count
reference tooken by get_user_pages to a specific page that is mapped
inside the shadow page tables.

This is needed so we can balance the pagecount against mapcount
checking.

(Right now kvm increase the pagecount and does not increase the
mapcount when mapping page into shadow page table entry,
so when comparing pagecount against mapcount, you have no
reliable result.)

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 arch/x86/kvm/mmu.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index eca41ae..6c67b23 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -634,9 +634,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
 	if (*spte & shadow_accessed_mask)
 		kvm_set_pfn_accessed(pfn);
 	if (is_writeble_pte(*spte))
-		kvm_release_pfn_dirty(pfn);
-	else
-		kvm_release_pfn_clean(pfn);
+		kvm_set_pfn_dirty(pfn);
 	rmapp = gfn_to_rmap(kvm, sp->gfns[spte - sp->spt], sp->role.level);
 	if (!*rmapp) {
 		printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte);
@@ -1877,8 +1875,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	page_header_update_slot(vcpu->kvm, sptep, gfn);
 	if (!was_rmapped) {
 		rmap_count = rmap_add(vcpu, sptep, gfn);
-		if (!is_rmap_spte(*sptep))
-			kvm_release_pfn_clean(pfn);
+		kvm_release_pfn_clean(pfn);
 		if (rmap_count > RMAP_RECYCLE_THRESHOLD)
 			rmap_recycle(vcpu, sptep, gfn);
 	} else {
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages.
  2009-04-12  9:01       ` Izik Eidus
@ 2009-04-12  9:42         ` Avi Kivity
  0 siblings, 0 replies; 24+ messages in thread
From: Avi Kivity @ 2009-04-12  9:42 UTC (permalink / raw)
  To: Izik Eidus; +Cc: Marcelo Tosatti, kvm

Izik Eidus wrote:
> Izik Eidus wrote:
>> Marcelo Tosatti wrote:
>>> On Tue, Mar 31, 2009 at 03:00:02AM +0300, Izik Eidus wrote:
>>>  
>>>> When using mmu notifiers, we are allowed to remove the page count
>>>> reference tooken by get_user_pages to a specific page that is mapped
>>>> inside the shadow page tables.
>>>>
>>>> This is needed so we can balance the pagecount against mapcount
>>>> checking.
>>>>
>>>> (Right now kvm increase the pagecount and does not increase the
>>>> mapcount when mapping page into shadow page table entry,
>>>> so when comparing pagecount against mapcount, you have no
>>>> reliable result.)
>>>>     
>>>
>>> IMO ifdef'ing CONFIG_MMU_NOTIFIERS here (and keeping the ref if unset)
>>> instead of in the backward compat code gives less room for headaches.
>>>
>>>   
>> That was the first version of this patch, Avi preferred not to do it...
>>
> Avi, You mind if i changed it to use the IFDEF ?

No, let's not have too many ifdefs in the code.  I'll hack it in 
hack-module.awk.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages.
  2009-04-09 12:52     ` Izik Eidus
@ 2009-04-12  9:01       ` Izik Eidus
  2009-04-12  9:42         ` Avi Kivity
  0 siblings, 1 reply; 24+ messages in thread
From: Izik Eidus @ 2009-04-12  9:01 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, avi

Izik Eidus wrote:
> Marcelo Tosatti wrote:
>> On Tue, Mar 31, 2009 at 03:00:02AM +0300, Izik Eidus wrote:
>>  
>>> When using mmu notifiers, we are allowed to remove the page count
>>> reference tooken by get_user_pages to a specific page that is mapped
>>> inside the shadow page tables.
>>>
>>> This is needed so we can balance the pagecount against mapcount
>>> checking.
>>>
>>> (Right now kvm increase the pagecount and does not increase the
>>> mapcount when mapping page into shadow page table entry,
>>> so when comparing pagecount against mapcount, you have no
>>> reliable result.)
>>>     
>>
>> IMO ifdef'ing CONFIG_MMU_NOTIFIERS here (and keeping the ref if unset)
>> instead of in the backward compat code gives less room for headaches.
>>
>>   
> That was the first version of this patch, Avi preferred not to do it...
>
Avi, You mind if i changed it to use the IFDEF ?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages.
  2009-04-09  9:54   ` Marcelo Tosatti
@ 2009-04-09 12:52     ` Izik Eidus
  2009-04-12  9:01       ` Izik Eidus
  0 siblings, 1 reply; 24+ messages in thread
From: Izik Eidus @ 2009-04-09 12:52 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm, avi

Marcelo Tosatti wrote:
> On Tue, Mar 31, 2009 at 03:00:02AM +0300, Izik Eidus wrote:
>   
>> When using mmu notifiers, we are allowed to remove the page count
>> reference tooken by get_user_pages to a specific page that is mapped
>> inside the shadow page tables.
>>
>> This is needed so we can balance the pagecount against mapcount
>> checking.
>>
>> (Right now kvm increase the pagecount and does not increase the
>> mapcount when mapping page into shadow page table entry,
>> so when comparing pagecount against mapcount, you have no
>> reliable result.)
>>     
>
> IMO ifdef'ing CONFIG_MMU_NOTIFIERS here (and keeping the ref if unset)
> instead of in the backward compat code gives less room for headaches.
>
>   
That was the first version of this patch, Avi preferred not to do it...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages.
  2009-03-31  0:00   ` Izik Eidus
  (?)
@ 2009-04-09  9:54   ` Marcelo Tosatti
  2009-04-09 12:52     ` Izik Eidus
  -1 siblings, 1 reply; 24+ messages in thread
From: Marcelo Tosatti @ 2009-04-09  9:54 UTC (permalink / raw)
  To: Izik Eidus; +Cc: kvm, avi

On Tue, Mar 31, 2009 at 03:00:02AM +0300, Izik Eidus wrote:
> When using mmu notifiers, we are allowed to remove the page count
> reference tooken by get_user_pages to a specific page that is mapped
> inside the shadow page tables.
> 
> This is needed so we can balance the pagecount against mapcount
> checking.
> 
> (Right now kvm increase the pagecount and does not increase the
> mapcount when mapping page into shadow page table entry,
> so when comparing pagecount against mapcount, you have no
> reliable result.)

IMO ifdef'ing CONFIG_MMU_NOTIFIERS here (and keeping the ref if unset)
instead of in the backward compat code gives less room for headaches.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages.
  2009-03-31  0:00 [PATCH 0/3] kvm support for ksm Izik Eidus
@ 2009-03-31  0:00   ` Izik Eidus
  0 siblings, 0 replies; 24+ messages in thread
From: Izik Eidus @ 2009-03-31  0:00 UTC (permalink / raw)
  Cc: linux-kernel, kvm, linux-mm, avi, aarcange, chrisw, riel, jeremy,
	mtosatti, hugh, corbet, yaniv, dmonakhov, Izik Eidus

When using mmu notifiers, we are allowed to remove the page count
reference tooken by get_user_pages to a specific page that is mapped
inside the shadow page tables.

This is needed so we can balance the pagecount against mapcount
checking.

(Right now kvm increase the pagecount and does not increase the
mapcount when mapping page into shadow page table entry,
so when comparing pagecount against mapcount, you have no
reliable result.)

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 arch/x86/kvm/mmu.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b625ed4..df8fbaf 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -567,9 +567,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
 	if (*spte & shadow_accessed_mask)
 		kvm_set_pfn_accessed(pfn);
 	if (is_writeble_pte(*spte))
-		kvm_release_pfn_dirty(pfn);
-	else
-		kvm_release_pfn_clean(pfn);
+		kvm_set_pfn_dirty(pfn);
 	rmapp = gfn_to_rmap(kvm, sp->gfns[spte - sp->spt], is_large_pte(*spte));
 	if (!*rmapp) {
 		printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte);
@@ -1812,8 +1810,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 	page_header_update_slot(vcpu->kvm, shadow_pte, gfn);
 	if (!was_rmapped) {
 		rmap_add(vcpu, shadow_pte, gfn, largepage);
-		if (!is_rmap_pte(*shadow_pte))
-			kvm_release_pfn_clean(pfn);
+		kvm_release_pfn_clean(pfn);
 	} else {
 		if (was_writeble)
 			kvm_release_pfn_dirty(pfn);
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages.
@ 2009-03-31  0:00   ` Izik Eidus
  0 siblings, 0 replies; 24+ messages in thread
From: Izik Eidus @ 2009-03-31  0:00 UTC (permalink / raw)
  Cc: linux-kernel, kvm, linux-mm, avi, aarcange, chrisw, riel, jeremy,
	mtosatti, hugh, corbet, yaniv, dmonakhov, Izik Eidus

When using mmu notifiers, we are allowed to remove the page count
reference tooken by get_user_pages to a specific page that is mapped
inside the shadow page tables.

This is needed so we can balance the pagecount against mapcount
checking.

(Right now kvm increase the pagecount and does not increase the
mapcount when mapping page into shadow page table entry,
so when comparing pagecount against mapcount, you have no
reliable result.)

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 arch/x86/kvm/mmu.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b625ed4..df8fbaf 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -567,9 +567,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
 	if (*spte & shadow_accessed_mask)
 		kvm_set_pfn_accessed(pfn);
 	if (is_writeble_pte(*spte))
-		kvm_release_pfn_dirty(pfn);
-	else
-		kvm_release_pfn_clean(pfn);
+		kvm_set_pfn_dirty(pfn);
 	rmapp = gfn_to_rmap(kvm, sp->gfns[spte - sp->spt], is_large_pte(*spte));
 	if (!*rmapp) {
 		printk(KERN_ERR "rmap_remove: %p %llx 0->BUG\n", spte, *spte);
@@ -1812,8 +1810,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
 	page_header_update_slot(vcpu->kvm, shadow_pte, gfn);
 	if (!was_rmapped) {
 		rmap_add(vcpu, shadow_pte, gfn, largepage);
-		if (!is_rmap_pte(*shadow_pte))
-			kvm_release_pfn_clean(pfn);
+		kvm_release_pfn_clean(pfn);
 	} else {
 		if (was_writeble)
 			kvm_release_pfn_dirty(pfn);
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2009-09-24 16:56 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-10 16:38 [PATCH 0/3] ksm support for kvm Izik Eidus
2009-09-10 16:38 ` [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages Izik Eidus
2009-09-10 16:38   ` [PATCH 2/3] add SPTE_HOST_WRITEABLE flag to the shadow ptes Izik Eidus
2009-09-10 16:38     ` [PATCH 3/3] add support for change_pte mmu notifiers Izik Eidus
2009-09-11 21:24       ` Marcelo Tosatti
2009-09-12  6:41         ` Izik Eidus
2009-09-12 16:18           ` Marcelo Tosatti
2009-09-12 17:04             ` Izik Eidus
2009-09-12 16:59               ` Marcelo Tosatti
2009-09-11 21:14     ` [PATCH 2/3] add SPTE_HOST_WRITEABLE flag to the shadow ptes Marcelo Tosatti
2009-09-12  6:35       ` Izik Eidus
2009-09-14  5:18     ` Avi Kivity
2009-09-14  8:34     ` Marcelo Tosatti
2009-09-14 16:51       ` Izik Eidus
  -- strict thread matches above, loose matches on Subject: below --
2009-09-23 18:47 [PATCH 0/3] kvm ksm support v3 Izik Eidus
2009-09-23 18:47 ` [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages Izik Eidus
2009-09-24 14:18   ` Marcelo Tosatti
2009-09-24 16:56   ` Andrea Arcangeli
2009-09-23 18:25 [PATCH 0/3] ksm support for kvm v2 Izik Eidus
2009-09-23 18:25 ` [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages Izik Eidus
2009-03-31  0:00 [PATCH 0/3] kvm support for ksm Izik Eidus
2009-03-31  0:00 ` [PATCH 1/3] kvm: dont hold pagecount reference for mapped sptes pages Izik Eidus
2009-03-31  0:00   ` Izik Eidus
2009-04-09  9:54   ` Marcelo Tosatti
2009-04-09 12:52     ` Izik Eidus
2009-04-12  9:01       ` Izik Eidus
2009-04-12  9:42         ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.