linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] kvm/x86: Keep root hpa in prev_roots as much as possible
@ 2021-05-25 21:39 Lai Jiangshan
  2021-07-29 18:05 ` Sean Christopherson
  0 siblings, 1 reply; 4+ messages in thread
From: Lai Jiangshan @ 2021-05-25 21:39 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Lai Jiangshan, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson

From: Lai Jiangshan <laijs@linux.alibaba.com>

Pagetable roots in prev_roots[] are likely to be reused soon and
there is no much overhead to keep it with a new need_sync field
introduced.

With the help of the new need_sync field, pagetable roots are
kept as much as possible, and they will be re-synced before reused
instead of being dropped.

Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
---

This patch is just for RFC.
  Is the idea Ok?
  If the idea is Ok, we need to reused one bit from pgd or hpa
    as need_sync to save memory.  Which one is better?

 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/mmu/mmu.c          |  6 ++++++
 arch/x86/kvm/vmx/nested.c       | 12 ++++--------
 arch/x86/kvm/x86.c              |  9 +++++----
 4 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55efbacfc244..19a337cf7aa6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -354,10 +354,11 @@ struct rsvd_bits_validate {
 struct kvm_mmu_root_info {
 	gpa_t pgd;
 	hpa_t hpa;
+	bool need_sync;
 };
 
 #define KVM_MMU_ROOT_INFO_INVALID \
-	((struct kvm_mmu_root_info) { .pgd = INVALID_PAGE, .hpa = INVALID_PAGE })
+	((struct kvm_mmu_root_info) { .pgd = INVALID_PAGE, .hpa = INVALID_PAGE, .need_sync = true})
 
 #define KVM_MMU_NUM_PREV_ROOTS 3
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5e60b00e8e50..147827135549 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3878,6 +3878,7 @@ static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_pgd,
 
 	root.pgd = mmu->root_pgd;
 	root.hpa = mmu->root_hpa;
+	root.need_sync = false;
 
 	if (is_root_usable(&root, new_pgd, new_role))
 		return true;
@@ -3892,6 +3893,11 @@ static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_pgd,
 	mmu->root_hpa = root.hpa;
 	mmu->root_pgd = root.pgd;
 
+	if (i < KVM_MMU_NUM_PREV_ROOTS && root.need_sync) {
+		kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
+		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
+	}
+
 	return i < KVM_MMU_NUM_PREV_ROOTS;
 }
 
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6058a65a6ede..ab7069ac6dc5 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5312,7 +5312,7 @@ static int handle_invept(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	u32 vmx_instruction_info, types;
-	unsigned long type, roots_to_free;
+	unsigned long type;
 	struct kvm_mmu *mmu;
 	gva_t gva;
 	struct x86_exception e;
@@ -5361,29 +5361,25 @@ static int handle_invept(struct kvm_vcpu *vcpu)
 			return nested_vmx_fail(vcpu,
 				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
 
-		roots_to_free = 0;
 		if (nested_ept_root_matches(mmu->root_hpa, mmu->root_pgd,
 					    operand.eptp))
-			roots_to_free |= KVM_MMU_ROOT_CURRENT;
+			kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOT_CURRENT);
 
 		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
 			if (nested_ept_root_matches(mmu->prev_roots[i].hpa,
 						    mmu->prev_roots[i].pgd,
 						    operand.eptp))
-				roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
+				mmu->prev_roots[i].need_sync = true;
 		}
 		break;
 	case VMX_EPT_EXTENT_GLOBAL:
-		roots_to_free = KVM_MMU_ROOTS_ALL;
+		kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOTS_ALL);
 		break;
 	default:
 		BUG();
 		break;
 	}
 
-	if (roots_to_free)
-		kvm_mmu_free_roots(vcpu, mmu, roots_to_free);
-
 	return nested_vmx_succeed(vcpu);
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bbc4e04e67ad..1f5617ec6b34 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11680,7 +11680,6 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
 	bool pcid_enabled;
 	struct x86_exception e;
 	unsigned i;
-	unsigned long roots_to_free = 0;
 	struct {
 		u64 pcid;
 		u64 gla;
@@ -11722,9 +11721,8 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
 		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
 			if (kvm_get_pcid(vcpu, vcpu->arch.mmu->prev_roots[i].pgd)
 			    == operand.pcid)
-				roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
+				vcpu->arch.mmu->prev_roots[i].need_sync = true;
 
-		kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, roots_to_free);
 		/*
 		 * If neither the current cr3 nor any of the prev_roots use the
 		 * given PCID, then nothing needs to be done here because a
@@ -11743,7 +11741,10 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
 
 		fallthrough;
 	case INVPCID_TYPE_ALL_INCL_GLOBAL:
-		kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
+		kvm_mmu_sync_roots(vcpu);
+		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
+		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
+			vcpu->arch.mmu->prev_roots[i].need_sync = true;
 		return kvm_skip_emulated_instruction(vcpu);
 
 	default:
-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] kvm/x86: Keep root hpa in prev_roots as much as possible
  2021-05-25 21:39 [RFC PATCH] kvm/x86: Keep root hpa in prev_roots as much as possible Lai Jiangshan
@ 2021-07-29 18:05 ` Sean Christopherson
  2021-08-03  1:19   ` Lai Jiangshan
  0 siblings, 1 reply; 4+ messages in thread
From: Sean Christopherson @ 2021-07-29 18:05 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, kvm, Paolo Bonzini, Lai Jiangshan,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson

On Wed, May 26, 2021, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@linux.alibaba.com>
> 
> Pagetable roots in prev_roots[] are likely to be reused soon and
> there is no much overhead to keep it with a new need_sync field
> introduced.
> 
> With the help of the new need_sync field, pagetable roots are
> kept as much as possible, and they will be re-synced before reused
> instead of being dropped.
> 
> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
> ---
> 
> This patch is just for RFC.
>   Is the idea Ok?

Yes, the idea is definitely a good one.

>   If the idea is Ok, we need to reused one bit from pgd or hpa
>     as need_sync to save memory.  Which one is better?

Ha, we can do this without increasing the memory footprint and without co-opting
a bit from pgd or hpa.  Because of compiler alignment/padding, the u8s and bools
between mmu_role and prev_roots already occupy 8 bytes, even though the actual
size is 4 bytes.  In total, we need room for 4 roots (3 previous + current), i.e.
4 bytes.  If a separate array is used, no additional memory is consumed and no
masking is needed when reading/writing e.g. pgd.

The cost is an extra swap() when updating the prev_roots LRU, but that's peanuts
and would likely be offset by masking anyways.

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 99f37781a6fc..13bb3c3a60b4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -424,10 +424,12 @@ struct kvm_mmu {
        hpa_t root_hpa;
        gpa_t root_pgd;
        union kvm_mmu_role mmu_role;
+       bool root_unsync;
        u8 root_level;
        u8 shadow_root_level;
        u8 ept_ad;
        bool direct_map;
+       bool unsync_roots[KVM_MMU_NUM_PREV_ROOTS];
        struct kvm_mmu_root_info prev_roots[KVM_MMU_NUM_PREV_ROOTS];

        /*


>  arch/x86/include/asm/kvm_host.h |  3 ++-
>  arch/x86/kvm/mmu/mmu.c          |  6 ++++++
>  arch/x86/kvm/vmx/nested.c       | 12 ++++--------
>  arch/x86/kvm/x86.c              |  9 +++++----
>  4 files changed, 17 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 55efbacfc244..19a337cf7aa6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -354,10 +354,11 @@ struct rsvd_bits_validate {
>  struct kvm_mmu_root_info {
>  	gpa_t pgd;
>  	hpa_t hpa;
> +	bool need_sync;

Hmm, use "unsync" instead of "need_sync", purely to match the existing terminology
in KVM's MMU for this sort of behavior.

>  };
>  
>  #define KVM_MMU_ROOT_INFO_INVALID \
> -	((struct kvm_mmu_root_info) { .pgd = INVALID_PAGE, .hpa = INVALID_PAGE })
> +	((struct kvm_mmu_root_info) { .pgd = INVALID_PAGE, .hpa = INVALID_PAGE, .need_sync = true})
>  
>  #define KVM_MMU_NUM_PREV_ROOTS 3
>  
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 5e60b00e8e50..147827135549 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3878,6 +3878,7 @@ static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_pgd,
>  
>  	root.pgd = mmu->root_pgd;
>  	root.hpa = mmu->root_hpa;
> +	root.need_sync = false;
>  
>  	if (is_root_usable(&root, new_pgd, new_role))
>  		return true;
> @@ -3892,6 +3893,11 @@ static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_pgd,
>  	mmu->root_hpa = root.hpa;
>  	mmu->root_pgd = root.pgd;
>  
> +	if (i < KVM_MMU_NUM_PREV_ROOTS && root.need_sync) {

Probably makes sense to write this as:

	if (i >= KVM_MMU_NUM_PREV_ROOTS)
		return false;

	if (root.need_sync) {
		kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
	}
	return true;

The "i < KVM_MMU_NUM_PREV_ROOTS == success" logic is just confusing enough that
it'd be nice to write it only once.

And that would also play nicely with deferring a sync for the "current" root
(see below), e.g.

	...
	unsync = mmu->root_unsync;

	if (is_root_usable(&root, new_pgd, new_role))
		goto found_root;

	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
		swap(root, mmu->prev_roots[i]);
		swap(unsync, mmu->unsync_roots[i]);

		if (is_root_usable(&root, new_pgd, new_role))
			break;
	}

        if (i >= KVM_MMU_NUM_PREV_ROOTS)
                return false;

found_root:
        if (unsync) {
                kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
                kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
        }
        return true;

> +		kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
> +		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> +	}
> +
>  	return i < KVM_MMU_NUM_PREV_ROOTS;
>  }
>  
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 6058a65a6ede..ab7069ac6dc5 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -5312,7 +5312,7 @@ static int handle_invept(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  	u32 vmx_instruction_info, types;
> -	unsigned long type, roots_to_free;
> +	unsigned long type;
>  	struct kvm_mmu *mmu;
>  	gva_t gva;
>  	struct x86_exception e;
> @@ -5361,29 +5361,25 @@ static int handle_invept(struct kvm_vcpu *vcpu)
>  			return nested_vmx_fail(vcpu,
>  				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
>  
> -		roots_to_free = 0;
>  		if (nested_ept_root_matches(mmu->root_hpa, mmu->root_pgd,
>  					    operand.eptp))
> -			roots_to_free |= KVM_MMU_ROOT_CURRENT;
> +			kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOT_CURRENT);

For a non-RFC series, I think this should do two things:

  1. Separate INVEPT from INVPCID, i.e. do only INVPCID first.
  2. Enhance INVEPT to SYNC+FLUSH the current root instead of freeing it

As alluded to above, this can be done by deferring the sync+flush (which can't
be done right away because INVEPT runs in L1 context, whereas KVM needs to sync+flush
L2 EPT context).

>  		for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
>  			if (nested_ept_root_matches(mmu->prev_roots[i].hpa,
>  						    mmu->prev_roots[i].pgd,
>  						    operand.eptp))
> -				roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
> +				mmu->prev_roots[i].need_sync = true;
>  		}
>  		break;
>  	case VMX_EPT_EXTENT_GLOBAL:
> -		roots_to_free = KVM_MMU_ROOTS_ALL;
> +		kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOTS_ALL);
>  		break;
>  	default:
>  		BUG();
>  		break;
>  	}
>  
> -	if (roots_to_free)
> -		kvm_mmu_free_roots(vcpu, mmu, roots_to_free);
> -
>  	return nested_vmx_succeed(vcpu);
>  }

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] kvm/x86: Keep root hpa in prev_roots as much as possible
  2021-07-29 18:05 ` Sean Christopherson
@ 2021-08-03  1:19   ` Lai Jiangshan
  2021-08-03 15:57     ` Sean Christopherson
  0 siblings, 1 reply; 4+ messages in thread
From: Lai Jiangshan @ 2021-08-03  1:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: LKML, kvm, Paolo Bonzini, Lai Jiangshan, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson

On Fri, Jul 30, 2021 at 2:06 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, May 26, 2021, Lai Jiangshan wrote:
> > From: Lai Jiangshan <laijs@linux.alibaba.com>
> >
> > Pagetable roots in prev_roots[] are likely to be reused soon and
> > there is no much overhead to keep it with a new need_sync field
> > introduced.
> >
> > With the help of the new need_sync field, pagetable roots are
> > kept as much as possible, and they will be re-synced before reused
> > instead of being dropped.
> >
> > Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
> > ---
> >
> > This patch is just for RFC.
> >   Is the idea Ok?
>
> Yes, the idea is definitely a good one.
>
> >   If the idea is Ok, we need to reused one bit from pgd or hpa
> >     as need_sync to save memory.  Which one is better?
>
> Ha, we can do this without increasing the memory footprint and without co-opting
> a bit from pgd or hpa.  Because of compiler alignment/padding, the u8s and bools
> between mmu_role and prev_roots already occupy 8 bytes, even though the actual
> size is 4 bytes.  In total, we need room for 4 roots (3 previous + current), i.e.
> 4 bytes.  If a separate array is used, no additional memory is consumed and no
> masking is needed when reading/writing e.g. pgd.
>
> The cost is an extra swap() when updating the prev_roots LRU, but that's peanuts
> and would likely be offset by masking anyways.
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 99f37781a6fc..13bb3c3a60b4 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -424,10 +424,12 @@ struct kvm_mmu {
>         hpa_t root_hpa;
>         gpa_t root_pgd;
>         union kvm_mmu_role mmu_role;
> +       bool root_unsync;
>         u8 root_level;
>         u8 shadow_root_level;
>         u8 ept_ad;
>         bool direct_map;
> +       bool unsync_roots[KVM_MMU_NUM_PREV_ROOTS];
>         struct kvm_mmu_root_info prev_roots[KVM_MMU_NUM_PREV_ROOTS];
>

Hello

I think it is too complicated.  And it is hard to accept to put "unsync"
out of struct kvm_mmu_root_info when they should be bound to each other.

How about this:
- KVM_MMU_NUM_PREV_ROOTS
+ KVM_MMU_NUM_CACHED_ROOTS
- mmu->prev_roots[KVM_MMU_NUM_PREV_ROOTS]
+ mmu->cached_roots[KVM_MMU_NUM_CACHED_ROOTS]
- mmu->root_hpa
+ mmu->cached_roots[0].hpa
- mmu->root_pgd
+ mmu->cached_roots[0].pgd

And using the bit63 in @pgd as the information that it is not requested
to sync since the last sync.

Thanks
Lai.

>         /*
>
>
> >  arch/x86/include/asm/kvm_host.h |  3 ++-
> >  arch/x86/kvm/mmu/mmu.c          |  6 ++++++
> >  arch/x86/kvm/vmx/nested.c       | 12 ++++--------
> >  arch/x86/kvm/x86.c              |  9 +++++----
> >  4 files changed, 17 insertions(+), 13 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 55efbacfc244..19a337cf7aa6 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -354,10 +354,11 @@ struct rsvd_bits_validate {
> >  struct kvm_mmu_root_info {
> >       gpa_t pgd;
> >       hpa_t hpa;
> > +     bool need_sync;
>
> Hmm, use "unsync" instead of "need_sync", purely to match the existing terminology
> in KVM's MMU for this sort of behavior.
>
> >  };
> >
> >  #define KVM_MMU_ROOT_INFO_INVALID \
> > -     ((struct kvm_mmu_root_info) { .pgd = INVALID_PAGE, .hpa = INVALID_PAGE })
> > +     ((struct kvm_mmu_root_info) { .pgd = INVALID_PAGE, .hpa = INVALID_PAGE, .need_sync = true})
> >
> >  #define KVM_MMU_NUM_PREV_ROOTS 3
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 5e60b00e8e50..147827135549 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -3878,6 +3878,7 @@ static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_pgd,
> >
> >       root.pgd = mmu->root_pgd;
> >       root.hpa = mmu->root_hpa;
> > +     root.need_sync = false;
> >
> >       if (is_root_usable(&root, new_pgd, new_role))
> >               return true;
> > @@ -3892,6 +3893,11 @@ static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_pgd,
> >       mmu->root_hpa = root.hpa;
> >       mmu->root_pgd = root.pgd;
> >
> > +     if (i < KVM_MMU_NUM_PREV_ROOTS && root.need_sync) {
>
> Probably makes sense to write this as:
>
>         if (i >= KVM_MMU_NUM_PREV_ROOTS)
>                 return false;
>
>         if (root.need_sync) {
>                 kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
>                 kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
>         }
>         return true;
>
> The "i < KVM_MMU_NUM_PREV_ROOTS == success" logic is just confusing enough that
> it'd be nice to write it only once.
>
> And that would also play nicely with deferring a sync for the "current" root
> (see below), e.g.
>
>         ...
>         unsync = mmu->root_unsync;
>
>         if (is_root_usable(&root, new_pgd, new_role))
>                 goto found_root;
>
>         for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
>                 swap(root, mmu->prev_roots[i]);
>                 swap(unsync, mmu->unsync_roots[i]);
>
>                 if (is_root_usable(&root, new_pgd, new_role))
>                         break;
>         }
>
>         if (i >= KVM_MMU_NUM_PREV_ROOTS)
>                 return false;
>
> found_root:
>         if (unsync) {
>                 kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
>                 kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
>         }
>         return true;
>
> > +             kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
> > +             kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> > +     }
> > +
> >       return i < KVM_MMU_NUM_PREV_ROOTS;
> >  }
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index 6058a65a6ede..ab7069ac6dc5 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -5312,7 +5312,7 @@ static int handle_invept(struct kvm_vcpu *vcpu)
> >  {
> >       struct vcpu_vmx *vmx = to_vmx(vcpu);
> >       u32 vmx_instruction_info, types;
> > -     unsigned long type, roots_to_free;
> > +     unsigned long type;
> >       struct kvm_mmu *mmu;
> >       gva_t gva;
> >       struct x86_exception e;
> > @@ -5361,29 +5361,25 @@ static int handle_invept(struct kvm_vcpu *vcpu)
> >                       return nested_vmx_fail(vcpu,
> >                               VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
> >
> > -             roots_to_free = 0;
> >               if (nested_ept_root_matches(mmu->root_hpa, mmu->root_pgd,
> >                                           operand.eptp))
> > -                     roots_to_free |= KVM_MMU_ROOT_CURRENT;
> > +                     kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOT_CURRENT);
>
> For a non-RFC series, I think this should do two things:
>
>   1. Separate INVEPT from INVPCID, i.e. do only INVPCID first.
>   2. Enhance INVEPT to SYNC+FLUSH the current root instead of freeing it
>
> As alluded to above, this can be done by deferring the sync+flush (which can't
> be done right away because INVEPT runs in L1 context, whereas KVM needs to sync+flush
> L2 EPT context).
>
> >               for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
> >                       if (nested_ept_root_matches(mmu->prev_roots[i].hpa,
> >                                                   mmu->prev_roots[i].pgd,
> >                                                   operand.eptp))
> > -                             roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i);
> > +                             mmu->prev_roots[i].need_sync = true;
> >               }
> >               break;
> >       case VMX_EPT_EXTENT_GLOBAL:
> > -             roots_to_free = KVM_MMU_ROOTS_ALL;
> > +             kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOTS_ALL);
> >               break;
> >       default:
> >               BUG();
> >               break;
> >       }
> >
> > -     if (roots_to_free)
> > -             kvm_mmu_free_roots(vcpu, mmu, roots_to_free);
> > -
> >       return nested_vmx_succeed(vcpu);
> >  }

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] kvm/x86: Keep root hpa in prev_roots as much as possible
  2021-08-03  1:19   ` Lai Jiangshan
@ 2021-08-03 15:57     ` Sean Christopherson
  0 siblings, 0 replies; 4+ messages in thread
From: Sean Christopherson @ 2021-08-03 15:57 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: LKML, kvm, Paolo Bonzini, Lai Jiangshan, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson

On Tue, Aug 03, 2021, Lai Jiangshan wrote:
> On Fri, Jul 30, 2021 at 2:06 AM Sean Christopherson <seanjc@google.com> wrote:
> > Ha, we can do this without increasing the memory footprint and without co-opting
> > a bit from pgd or hpa.  Because of compiler alignment/padding, the u8s and bools
> > between mmu_role and prev_roots already occupy 8 bytes, even though the actual
> > size is 4 bytes.  In total, we need room for 4 roots (3 previous + current), i.e.
> > 4 bytes.  If a separate array is used, no additional memory is consumed and no
> > masking is needed when reading/writing e.g. pgd.
> >
> > The cost is an extra swap() when updating the prev_roots LRU, but that's peanuts
> > and would likely be offset by masking anyways.
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 99f37781a6fc..13bb3c3a60b4 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -424,10 +424,12 @@ struct kvm_mmu {
> >         hpa_t root_hpa;
> >         gpa_t root_pgd;
> >         union kvm_mmu_role mmu_role;
> > +       bool root_unsync;
> >         u8 root_level;
> >         u8 shadow_root_level;
> >         u8 ept_ad;
> >         bool direct_map;
> > +       bool unsync_roots[KVM_MMU_NUM_PREV_ROOTS];
> >         struct kvm_mmu_root_info prev_roots[KVM_MMU_NUM_PREV_ROOTS];
> >
> 
> Hello
> 
> I think it is too complicated.  And it is hard to accept to put "unsync"
> out of struct kvm_mmu_root_info when they should be bound to each other.

I agree it's a bit ugly to have the separate unsync_roots array, but I don't see
how it's any more complex.  It's literally a single swap() call.

> How about this:
> - KVM_MMU_NUM_PREV_ROOTS
> + KVM_MMU_NUM_CACHED_ROOTS
> - mmu->prev_roots[KVM_MMU_NUM_PREV_ROOTS]
> + mmu->cached_roots[KVM_MMU_NUM_CACHED_ROOTS]

I don't have a strong preference on PREV vs CACHED.  CACHED is probably more
intuitive, but KVM isn't truly caching the root, it's just tracking the HPA (and
PGD for indirect MMUs), e.g. the root may no longer exist if the backing shadow
page was zapped.  On the other hand, the main helper is cached_root_available()...

> - mmu->root_hpa
> + mmu->cached_roots[0].hpa
> - mmu->root_pgd
> + mmu->cached_roots[0].pgd
> 
> And using the bit63 in @pgd as the information that it is not requested

FWIW, using bit 0 will likely generate more efficient code.

> to sync since the last sync.

Again, I don't have a super strong preference.  I don't hate or love either one :-)

Vitaly, Paolo, any preferences on names and approaches for tracking if a "cached"
root is unsync?

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-08-03 15:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-25 21:39 [RFC PATCH] kvm/x86: Keep root hpa in prev_roots as much as possible Lai Jiangshan
2021-07-29 18:05 ` Sean Christopherson
2021-08-03  1:19   ` Lai Jiangshan
2021-08-03 15:57     ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).