All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] KVM: MMU: speedup update_permission_bitmask
@ 2017-08-24 15:56 Paolo Bonzini
  2017-08-28 19:42 ` Jim Mattson
  2017-08-29 16:46 ` David Hildenbrand
  0 siblings, 2 replies; 7+ messages in thread
From: Paolo Bonzini @ 2017-08-24 15:56 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: jmattson, david

update_permission_bitmask currently does a 128-iteration loop to,
essentially, compute a constant array.  Computing the 8 bits in parallel
reduces it to 16 iterations, and is enough to speed it up substantially
because many boolean operations in the inner loop become constants or
simplify noticeably.

Because update_permission_bitmask is actually the top item in the profile
for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand
clock cycles, or up to 30%:

                                         before     after
   cpuid                                 35173      25954
   vmcall                                35122      27079
   inl_from_pmtimer                      52635      42675
   inl_from_qemu                         53604      44599
   inl_from_kernel                       38498      30798
   outl_to_kernel                        34508      28816
   wr_tsc_adjust_msr                     34185      26818
   rd_tsc_adjust_msr                     37409      27049
   mmio-no-eventfd:pci-mem               50563      45276
   mmio-wildcard-eventfd:pci-mem         34495      30823
   mmio-datamatch-eventfd:pci-mem        35612      31071
   portio-no-eventfd:pci-io              44925      40661
   portio-wildcard-eventfd:pci-io        29708      27269
   portio-datamatch-eventfd:pci-io       31135      27164

(I wrote a small C program to compare the tables for all values of CR0.WP,
CR4.SMAP and CR4.SMEP, and they match).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu.c | 121 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 70 insertions(+), 51 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f47cccace1a1..2a8a6e3e2a31 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4204,66 +4204,85 @@ static inline bool boot_cpu_is_amd(void)
 				    boot_cpu_data.x86_phys_bits, execonly);
 }
 
+#define BYTE_MASK(access) \
+	((1 & (access) ? 2 : 0) | \
+	 (2 & (access) ? 4 : 0) | \
+	 (3 & (access) ? 8 : 0) | \
+	 (4 & (access) ? 16 : 0) | \
+	 (5 & (access) ? 32 : 0) | \
+	 (6 & (access) ? 64 : 0) | \
+	 (7 & (access) ? 128 : 0))
+
+
 static void update_permission_bitmask(struct kvm_vcpu *vcpu,
 				      struct kvm_mmu *mmu, bool ept)
 {
-	unsigned bit, byte, pfec;
-	u8 map;
-	bool fault, x, w, u, wf, uf, ff, smapf, cr4_smap, cr4_smep, smap = 0;
+	unsigned byte;
+
+	const u8 x = BYTE_MASK(ACC_EXEC_MASK);
+	const u8 w = BYTE_MASK(ACC_WRITE_MASK);
+	const u8 u = BYTE_MASK(ACC_USER_MASK);
+
+	bool cr4_smep = kvm_read_cr4_bits(vcpu, X86_CR4_SMEP) != 0;
+	bool cr4_smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP) != 0;
+	bool cr0_wp = is_write_protection(vcpu);
 
-	cr4_smep = kvm_read_cr4_bits(vcpu, X86_CR4_SMEP);
-	cr4_smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP);
 	for (byte = 0; byte < ARRAY_SIZE(mmu->permissions); ++byte) {
-		pfec = byte << 1;
-		map = 0;
-		wf = pfec & PFERR_WRITE_MASK;
-		uf = pfec & PFERR_USER_MASK;
-		ff = pfec & PFERR_FETCH_MASK;
+		unsigned pfec = byte << 1;
+
 		/*
-		 * PFERR_RSVD_MASK bit is set in PFEC if the access is not
-		 * subject to SMAP restrictions, and cleared otherwise. The
-		 * bit is only meaningful if the SMAP bit is set in CR4.
+		 * Each "*f" variable has a 1 bit for each UWX value
+		 * that causes a fault with the given PFEC.
 		 */
-		smapf = !(pfec & PFERR_RSVD_MASK);
-		for (bit = 0; bit < 8; ++bit) {
-			x = bit & ACC_EXEC_MASK;
-			w = bit & ACC_WRITE_MASK;
-			u = bit & ACC_USER_MASK;
-
-			if (!ept) {
-				/* Not really needed: !nx will cause pte.nx to fault */
-				x |= !mmu->nx;
-				/* Allow supervisor writes if !cr0.wp */
-				w |= !is_write_protection(vcpu) && !uf;
-				/* Disallow supervisor fetches of user code if cr4.smep */
-				x &= !(cr4_smep && u && !uf);
-
-				/*
-				 * SMAP:kernel-mode data accesses from user-mode
-				 * mappings should fault. A fault is considered
-				 * as a SMAP violation if all of the following
-				 * conditions are ture:
-				 *   - X86_CR4_SMAP is set in CR4
-				 *   - A user page is accessed
-				 *   - Page fault in kernel mode
-				 *   - if CPL = 3 or X86_EFLAGS_AC is clear
-				 *
-				 *   Here, we cover the first three conditions.
-				 *   The fourth is computed dynamically in
-				 *   permission_fault() and is in smapf.
-				 *
-				 *   Also, SMAP does not affect instruction
-				 *   fetches, add the !ff check here to make it
-				 *   clearer.
-				 */
-				smap = cr4_smap && u && !uf && !ff;
-			}
 
-			fault = (ff && !x) || (uf && !u) || (wf && !w) ||
-				(smapf && smap);
-			map |= fault << bit;
+		/* Faults from writes to non-writable pages */
+		u8 wf = (pfec & PFERR_WRITE_MASK) ? ~w : 0;
+		/* Faults from user mode accesses to supervisor pages */
+		u8 uf = (pfec & PFERR_USER_MASK) ? ~u : 0;
+		/* Faults from fetches of non-executable pages*/
+		u8 ff = (pfec & PFERR_FETCH_MASK) ? ~x : 0;
+		/* Faults from kernel mode fetches of user pages */
+		u8 smepf = 0;
+		/* Faults from kernel mode accesses of user pages */
+		u8 smapf = 0;
+
+		if (!ept) {
+			/* Faults from kernel mode accesses to user pages */
+			u8 kf = (pfec & PFERR_USER_MASK) ? 0 : u;
+
+			/* Not really needed: !nx will cause pte.nx to fault */
+			if (!mmu->nx)
+				ff = 0;
+
+			/* Allow supervisor writes if !cr0.wp */
+			if (!cr0_wp)
+				wf = (pfec & PFERR_USER_MASK) ? wf : 0;
+
+			/* Disallow supervisor fetches of user code if cr4.smep */
+			if (cr4_smep)
+				smepf = (pfec & PFERR_FETCH_MASK) ? kf : 0;
+
+			/*
+			 * SMAP:kernel-mode data accesses from user-mode
+			 * mappings should fault. A fault is considered
+			 * as a SMAP violation if all of the following
+			 * conditions are ture:
+			 *   - X86_CR4_SMAP is set in CR4
+			 *   - A user page is accessed
+			 *   - The access is not a fetch
+			 *   - Page fault in kernel mode
+			 *   - if CPL = 3 or X86_EFLAGS_AC is clear
+			 *
+			 * Here, we cover the first three conditions.
+			 * The fourth is computed dynamically in permission_fault();
+			 * PFERR_RSVD_MASK bit will be set in PFEC if the access is
+			 * *not* subject to SMAP restrictions.
+			 */
+			if (cr4_smap)
+				smapf = (pfec & (PFERR_RSVD_MASK|PFERR_FETCH_MASK)) ? 0 : kf;
 		}
-		mmu->permissions[byte] = map;
+
+		mmu->permissions[byte] = ff | uf | wf | smepf | smapf;
 	}
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] KVM: MMU: speedup update_permission_bitmask
  2017-08-24 15:56 [PATCH] KVM: MMU: speedup update_permission_bitmask Paolo Bonzini
@ 2017-08-28 19:42 ` Jim Mattson
  2017-09-12 16:48   ` Peter Feiner
  2017-08-29 16:46 ` David Hildenbrand
  1 sibling, 1 reply; 7+ messages in thread
From: Jim Mattson @ 2017-08-28 19:42 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: LKML, kvm list, David Hildenbrand

Looks okay to me, but I'm hoping Peter will chime in.

Reviewed-by: Jim Mattson <jmattson@google.com>

On Thu, Aug 24, 2017 at 8:56 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> update_permission_bitmask currently does a 128-iteration loop to,
> essentially, compute a constant array.  Computing the 8 bits in parallel
> reduces it to 16 iterations, and is enough to speed it up substantially
> because many boolean operations in the inner loop become constants or
> simplify noticeably.
>
> Because update_permission_bitmask is actually the top item in the profile
> for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand
> clock cycles, or up to 30%:
>
>                                          before     after
>    cpuid                                 35173      25954
>    vmcall                                35122      27079
>    inl_from_pmtimer                      52635      42675
>    inl_from_qemu                         53604      44599
>    inl_from_kernel                       38498      30798
>    outl_to_kernel                        34508      28816
>    wr_tsc_adjust_msr                     34185      26818
>    rd_tsc_adjust_msr                     37409      27049
>    mmio-no-eventfd:pci-mem               50563      45276
>    mmio-wildcard-eventfd:pci-mem         34495      30823
>    mmio-datamatch-eventfd:pci-mem        35612      31071
>    portio-no-eventfd:pci-io              44925      40661
>    portio-wildcard-eventfd:pci-io        29708      27269
>    portio-datamatch-eventfd:pci-io       31135      27164
>
> (I wrote a small C program to compare the tables for all values of CR0.WP,
> CR4.SMAP and CR4.SMEP, and they match).
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/mmu.c | 121 +++++++++++++++++++++++++++++++----------------------
>  1 file changed, 70 insertions(+), 51 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index f47cccace1a1..2a8a6e3e2a31 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -4204,66 +4204,85 @@ static inline bool boot_cpu_is_amd(void)
>                                     boot_cpu_data.x86_phys_bits, execonly);
>  }
>
> +#define BYTE_MASK(access) \
> +       ((1 & (access) ? 2 : 0) | \
> +        (2 & (access) ? 4 : 0) | \
> +        (3 & (access) ? 8 : 0) | \
> +        (4 & (access) ? 16 : 0) | \
> +        (5 & (access) ? 32 : 0) | \
> +        (6 & (access) ? 64 : 0) | \
> +        (7 & (access) ? 128 : 0))
> +
> +
>  static void update_permission_bitmask(struct kvm_vcpu *vcpu,
>                                       struct kvm_mmu *mmu, bool ept)
>  {
> -       unsigned bit, byte, pfec;
> -       u8 map;
> -       bool fault, x, w, u, wf, uf, ff, smapf, cr4_smap, cr4_smep, smap = 0;
> +       unsigned byte;
> +
> +       const u8 x = BYTE_MASK(ACC_EXEC_MASK);
> +       const u8 w = BYTE_MASK(ACC_WRITE_MASK);
> +       const u8 u = BYTE_MASK(ACC_USER_MASK);
> +
> +       bool cr4_smep = kvm_read_cr4_bits(vcpu, X86_CR4_SMEP) != 0;
> +       bool cr4_smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP) != 0;
> +       bool cr0_wp = is_write_protection(vcpu);
>
> -       cr4_smep = kvm_read_cr4_bits(vcpu, X86_CR4_SMEP);
> -       cr4_smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP);
>         for (byte = 0; byte < ARRAY_SIZE(mmu->permissions); ++byte) {
> -               pfec = byte << 1;
> -               map = 0;
> -               wf = pfec & PFERR_WRITE_MASK;
> -               uf = pfec & PFERR_USER_MASK;
> -               ff = pfec & PFERR_FETCH_MASK;
> +               unsigned pfec = byte << 1;
> +
>                 /*
> -                * PFERR_RSVD_MASK bit is set in PFEC if the access is not
> -                * subject to SMAP restrictions, and cleared otherwise. The
> -                * bit is only meaningful if the SMAP bit is set in CR4.
> +                * Each "*f" variable has a 1 bit for each UWX value
> +                * that causes a fault with the given PFEC.
>                  */
> -               smapf = !(pfec & PFERR_RSVD_MASK);
> -               for (bit = 0; bit < 8; ++bit) {
> -                       x = bit & ACC_EXEC_MASK;
> -                       w = bit & ACC_WRITE_MASK;
> -                       u = bit & ACC_USER_MASK;
> -
> -                       if (!ept) {
> -                               /* Not really needed: !nx will cause pte.nx to fault */
> -                               x |= !mmu->nx;
> -                               /* Allow supervisor writes if !cr0.wp */
> -                               w |= !is_write_protection(vcpu) && !uf;
> -                               /* Disallow supervisor fetches of user code if cr4.smep */
> -                               x &= !(cr4_smep && u && !uf);
> -
> -                               /*
> -                                * SMAP:kernel-mode data accesses from user-mode
> -                                * mappings should fault. A fault is considered
> -                                * as a SMAP violation if all of the following
> -                                * conditions are ture:
> -                                *   - X86_CR4_SMAP is set in CR4
> -                                *   - A user page is accessed
> -                                *   - Page fault in kernel mode
> -                                *   - if CPL = 3 or X86_EFLAGS_AC is clear
> -                                *
> -                                *   Here, we cover the first three conditions.
> -                                *   The fourth is computed dynamically in
> -                                *   permission_fault() and is in smapf.
> -                                *
> -                                *   Also, SMAP does not affect instruction
> -                                *   fetches, add the !ff check here to make it
> -                                *   clearer.
> -                                */
> -                               smap = cr4_smap && u && !uf && !ff;
> -                       }
>
> -                       fault = (ff && !x) || (uf && !u) || (wf && !w) ||
> -                               (smapf && smap);
> -                       map |= fault << bit;
> +               /* Faults from writes to non-writable pages */
> +               u8 wf = (pfec & PFERR_WRITE_MASK) ? ~w : 0;
> +               /* Faults from user mode accesses to supervisor pages */
> +               u8 uf = (pfec & PFERR_USER_MASK) ? ~u : 0;
> +               /* Faults from fetches of non-executable pages*/
> +               u8 ff = (pfec & PFERR_FETCH_MASK) ? ~x : 0;
> +               /* Faults from kernel mode fetches of user pages */
> +               u8 smepf = 0;
> +               /* Faults from kernel mode accesses of user pages */
> +               u8 smapf = 0;
> +
> +               if (!ept) {
> +                       /* Faults from kernel mode accesses to user pages */
> +                       u8 kf = (pfec & PFERR_USER_MASK) ? 0 : u;
> +
> +                       /* Not really needed: !nx will cause pte.nx to fault */
> +                       if (!mmu->nx)
> +                               ff = 0;
> +
> +                       /* Allow supervisor writes if !cr0.wp */
> +                       if (!cr0_wp)
> +                               wf = (pfec & PFERR_USER_MASK) ? wf : 0;
> +
> +                       /* Disallow supervisor fetches of user code if cr4.smep */
> +                       if (cr4_smep)
> +                               smepf = (pfec & PFERR_FETCH_MASK) ? kf : 0;
> +
> +                       /*
> +                        * SMAP:kernel-mode data accesses from user-mode
> +                        * mappings should fault. A fault is considered
> +                        * as a SMAP violation if all of the following
> +                        * conditions are ture:
> +                        *   - X86_CR4_SMAP is set in CR4
> +                        *   - A user page is accessed
> +                        *   - The access is not a fetch
> +                        *   - Page fault in kernel mode
> +                        *   - if CPL = 3 or X86_EFLAGS_AC is clear
> +                        *
> +                        * Here, we cover the first three conditions.
> +                        * The fourth is computed dynamically in permission_fault();
> +                        * PFERR_RSVD_MASK bit will be set in PFEC if the access is
> +                        * *not* subject to SMAP restrictions.
> +                        */
> +                       if (cr4_smap)
> +                               smapf = (pfec & (PFERR_RSVD_MASK|PFERR_FETCH_MASK)) ? 0 : kf;
>                 }
> -               mmu->permissions[byte] = map;
> +
> +               mmu->permissions[byte] = ff | uf | wf | smepf | smapf;
>         }
>  }
>
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] KVM: MMU: speedup update_permission_bitmask
  2017-08-24 15:56 [PATCH] KVM: MMU: speedup update_permission_bitmask Paolo Bonzini
  2017-08-28 19:42 ` Jim Mattson
@ 2017-08-29 16:46 ` David Hildenbrand
  2017-09-12  8:20   ` Paolo Bonzini
  1 sibling, 1 reply; 7+ messages in thread
From: David Hildenbrand @ 2017-08-29 16:46 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: jmattson

On 24.08.2017 17:56, Paolo Bonzini wrote:
> update_permission_bitmask currently does a 128-iteration loop to,
> essentially, compute a constant array.  Computing the 8 bits in parallel
> reduces it to 16 iterations, and is enough to speed it up substantially
> because many boolean operations in the inner loop become constants or
> simplify noticeably.
> 
> Because update_permission_bitmask is actually the top item in the profile
> for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand
> clock cycles, or up to 30%:
> 
>                                          before     after
>    cpuid                                 35173      25954
>    vmcall                                35122      27079
>    inl_from_pmtimer                      52635      42675
>    inl_from_qemu                         53604      44599
>    inl_from_kernel                       38498      30798
>    outl_to_kernel                        34508      28816
>    wr_tsc_adjust_msr                     34185      26818
>    rd_tsc_adjust_msr                     37409      27049
>    mmio-no-eventfd:pci-mem               50563      45276
>    mmio-wildcard-eventfd:pci-mem         34495      30823
>    mmio-datamatch-eventfd:pci-mem        35612      31071
>    portio-no-eventfd:pci-io              44925      40661
>    portio-wildcard-eventfd:pci-io        29708      27269
>    portio-datamatch-eventfd:pci-io       31135      27164
> 
> (I wrote a small C program to compare the tables for all values of CR0.WP,
> CR4.SMAP and CR4.SMEP, and they match).
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/mmu.c | 121 +++++++++++++++++++++++++++++++----------------------
>  1 file changed, 70 insertions(+), 51 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index f47cccace1a1..2a8a6e3e2a31 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -4204,66 +4204,85 @@ static inline bool boot_cpu_is_amd(void)
>  				    boot_cpu_data.x86_phys_bits, execonly);
>  }
>  
> +#define BYTE_MASK(access) \
> +	((1 & (access) ? 2 : 0) | \
> +	 (2 & (access) ? 4 : 0) | \
> +	 (3 & (access) ? 8 : 0) | \
> +	 (4 & (access) ? 16 : 0) | \
> +	 (5 & (access) ? 32 : 0) | \
> +	 (6 & (access) ? 64 : 0) | \
> +	 (7 & (access) ? 128 : 0))
> +

Hmm, I wonder if

#define BYTE_MASK(access) \
	((1 & (access) ? (1 << 1) : 0) | \
	 (2 & (access) ? (1 << 2) : 0) | \
...

would be easier to get

> +
>  static void update_permission_bitmask(struct kvm_vcpu *vcpu,
>  				      struct kvm_mmu *mmu, bool ept)
>  {
> -	unsigned bit, byte, pfec;
> -	u8 map;
> -	bool fault, x, w, u, wf, uf, ff, smapf, cr4_smap, cr4_smep, smap = 0;
> +	unsigned byte;
> +
> +	const u8 x = BYTE_MASK(ACC_EXEC_MASK);
> +	const u8 w = BYTE_MASK(ACC_WRITE_MASK);
> +	const u8 u = BYTE_MASK(ACC_USER_MASK);
> +
> +	bool cr4_smep = kvm_read_cr4_bits(vcpu, X86_CR4_SMEP) != 0;
> +	bool cr4_smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP) != 0;
> +	bool cr0_wp = is_write_protection(vcpu);

all three can be turned const. (and I'd drop the empty lines in between ..)

>  
> -	cr4_smep = kvm_read_cr4_bits(vcpu, X86_CR4_SMEP);
> -	cr4_smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP);
>  	for (byte = 0; byte < ARRAY_SIZE(mmu->permissions); ++byte) {
> -		pfec = byte << 1;
> -		map = 0;
> -		wf = pfec & PFERR_WRITE_MASK;
> -		uf = pfec & PFERR_USER_MASK;
> -		ff = pfec & PFERR_FETCH_MASK;
> +		unsigned pfec = byte << 1;

This one can be turned const, too. And I think you could get rid of
"byte" and directly work on "pfec" ... but not sure if that is better.
Most probably not.

> +
>  		/*
> -		 * PFERR_RSVD_MASK bit is set in PFEC if the access is not
> -		 * subject to SMAP restrictions, and cleared otherwise. The
> -		 * bit is only meaningful if the SMAP bit is set in CR4.
> +		 * Each "*f" variable has a 1 bit for each UWX value
> +		 * that causes a fault with the given PFEC.
>  		 */
> -		smapf = !(pfec & PFERR_RSVD_MASK);
> -		for (bit = 0; bit < 8; ++bit) {
> -			x = bit & ACC_EXEC_MASK;
> -			w = bit & ACC_WRITE_MASK;
> -			u = bit & ACC_USER_MASK;
> -
> -			if (!ept) {
> -				/* Not really needed: !nx will cause pte.nx to fault */
> -				x |= !mmu->nx;
> -				/* Allow supervisor writes if !cr0.wp */
> -				w |= !is_write_protection(vcpu) && !uf;
> -				/* Disallow supervisor fetches of user code if cr4.smep */
> -				x &= !(cr4_smep && u && !uf);
> -
> -				/*
> -				 * SMAP:kernel-mode data accesses from user-mode
> -				 * mappings should fault. A fault is considered
> -				 * as a SMAP violation if all of the following
> -				 * conditions are ture:
> -				 *   - X86_CR4_SMAP is set in CR4
> -				 *   - A user page is accessed
> -				 *   - Page fault in kernel mode
> -				 *   - if CPL = 3 or X86_EFLAGS_AC is clear
> -				 *
> -				 *   Here, we cover the first three conditions.
> -				 *   The fourth is computed dynamically in
> -				 *   permission_fault() and is in smapf.
> -				 *
> -				 *   Also, SMAP does not affect instruction
> -				 *   fetches, add the !ff check here to make it
> -				 *   clearer.
> -				 */
> -				smap = cr4_smap && u && !uf && !ff;
> -			}
>  
> -			fault = (ff && !x) || (uf && !u) || (wf && !w) ||
> -				(smapf && smap);
> -			map |= fault << bit;
> +		/* Faults from writes to non-writable pages */
> +		u8 wf = (pfec & PFERR_WRITE_MASK) ? ~w : 0;
> +		/* Faults from user mode accesses to supervisor pages */
> +		u8 uf = (pfec & PFERR_USER_MASK) ? ~u : 0;
> +		/* Faults from fetches of non-executable pages*/
> +		u8 ff = (pfec & PFERR_FETCH_MASK) ? ~x : 0;
> +		/* Faults from kernel mode fetches of user pages */
> +		u8 smepf = 0;
> +		/* Faults from kernel mode accesses of user pages */
> +		u8 smapf = 0;
> +
> +		if (!ept) {
> +			/* Faults from kernel mode accesses to user pages */
> +			u8 kf = (pfec & PFERR_USER_MASK) ? 0 : u;
> +
> +			/* Not really needed: !nx will cause pte.nx to fault */
> +			if (!mmu->nx)
> +				ff = 0;
> +
> +			/* Allow supervisor writes if !cr0.wp */
> +			if (!cr0_wp)
> +				wf = (pfec & PFERR_USER_MASK) ? wf : 0;
> +
> +			/* Disallow supervisor fetches of user code if cr4.smep */
> +			if (cr4_smep)
> +				smepf = (pfec & PFERR_FETCH_MASK) ? kf : 0;
> +
> +			/*
> +			 * SMAP:kernel-mode data accesses from user-mode
> +			 * mappings should fault. A fault is considered
> +			 * as a SMAP violation if all of the following
> +			 * conditions are ture:
> +			 *   - X86_CR4_SMAP is set in CR4
> +			 *   - A user page is accessed
> +			 *   - The access is not a fetch
> +			 *   - Page fault in kernel mode
> +			 *   - if CPL = 3 or X86_EFLAGS_AC is clear
> +			 *
> +			 * Here, we cover the first three conditions.
> +			 * The fourth is computed dynamically in permission_fault();
> +			 * PFERR_RSVD_MASK bit will be set in PFEC if the access is
> +			 * *not* subject to SMAP restrictions.
> +			 */
> +			if (cr4_smap)
> +				smapf = (pfec & (PFERR_RSVD_MASK|PFERR_FETCH_MASK)) ? 0 : kf;
>  		}
> -		mmu->permissions[byte] = map;
> +
> +		mmu->permissions[byte] = ff | uf | wf | smepf | smapf;
>  	}
>  }
>  
> 

My brain hurts, but I am pretty confident that this is correct. :)

-- 

Thanks,

David

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] KVM: MMU: speedup update_permission_bitmask
  2017-08-29 16:46 ` David Hildenbrand
@ 2017-09-12  8:20   ` Paolo Bonzini
  0 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2017-09-12  8:20 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel, kvm; +Cc: jmattson

On 29/08/2017 18:46, David Hildenbrand wrote:
>> +#define BYTE_MASK(access) \
>> +	((1 & (access) ? 2 : 0) | \
>> +	 (2 & (access) ? 4 : 0) | \
>> +	 (3 & (access) ? 8 : 0) | \
>> +	 (4 & (access) ? 16 : 0) | \
>> +	 (5 & (access) ? 32 : 0) | \
>> +	 (6 & (access) ? 64 : 0) | \
>> +	 (7 & (access) ? 128 : 0))
>> +
> Hmm, I wonder if
> 
> #define BYTE_MASK(access) \
> 	((1 & (access) ? (1 << 1) : 0) | \
> 	 (2 & (access) ? (1 << 2) : 0) | \
> ...
> 
> would be easier to get
> 

Yeah, you have a point.  Another way to write it is:

   (1 & (access) ? 0xAA : 0) | \
   (2 & (access) ? 0xCC : 0) | \
   (4 & (access) ? 0xF0 : 0)

but I think yours is the best.

>>
>> +
>> +	const u8 x = BYTE_MASK(ACC_EXEC_MASK);
>> +	const u8 w = BYTE_MASK(ACC_WRITE_MASK);
>> +	const u8 u = BYTE_MASK(ACC_USER_MASK);
>> +
>> +	bool cr4_smep = kvm_read_cr4_bits(vcpu, X86_CR4_SMEP) != 0;
>> +	bool cr4_smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP) != 0;
>> +	bool cr0_wp = is_write_protection(vcpu);
> 
> all three can be turned const. (and I'd drop the empty lines in between ..)

I am using const to identify a compile-time constant here, so more like
"static const" (but I was not sure if C optimizes away static const, so
I just used "const").  This explains also the empty line!

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] KVM: MMU: speedup update_permission_bitmask
  2017-08-28 19:42 ` Jim Mattson
@ 2017-09-12 16:48   ` Peter Feiner
  2017-09-12 19:55     ` Paolo Bonzini
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Feiner @ 2017-09-12 16:48 UTC (permalink / raw)
  To: Jim Mattson; +Cc: Paolo Bonzini, LKML, kvm list, David Hildenbrand

On Mon, Aug 28, 2017 at 12:42 PM, Jim Mattson <jmattson@google.com> wrote:
>
> Looks okay to me, but I'm hoping Peter will chime in.

Sorry, this slipped by. Busy couple of weeks!

>
>
> Reviewed-by: Jim Mattson <jmattson@google.com>
>
> On Thu, Aug 24, 2017 at 8:56 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > update_permission_bitmask currently does a 128-iteration loop to,
> > essentially, compute a constant array.  Computing the 8 bits in parallel
> > reduces it to 16 iterations, and is enough to speed it up substantially
> > because many boolean operations in the inner loop become constants or
> > simplify noticeably.
> >
> > Because update_permission_bitmask is actually the top item in the profile
> > for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand
> > clock cycles, or up to 30%:

This is a great improvement! Why not take it a step further and
compute the whole table once at module init time and be done with it?
There are only 5 extra input bits (nx, ept, smep, smap, wp), so the
whole table would only take up (1 << 5) * 16 = 512 bytes. Moreover, if
you had 32 VMs on the host, you'd actually save memory!

> >
> >                                          before     after
> >    cpuid                                 35173      25954
> >    vmcall                                35122      27079
> >    inl_from_pmtimer                      52635      42675
> >    inl_from_qemu                         53604      44599
> >    inl_from_kernel                       38498      30798
> >    outl_to_kernel                        34508      28816
> >    wr_tsc_adjust_msr                     34185      26818
> >    rd_tsc_adjust_msr                     37409      27049
> >    mmio-no-eventfd:pci-mem               50563      45276
> >    mmio-wildcard-eventfd:pci-mem         34495      30823
> >    mmio-datamatch-eventfd:pci-mem        35612      31071
> >    portio-no-eventfd:pci-io              44925      40661
> >    portio-wildcard-eventfd:pci-io        29708      27269
> >    portio-datamatch-eventfd:pci-io       31135      27164
> >
> > (I wrote a small C program to compare the tables for all values of CR0.WP,
> > CR4.SMAP and CR4.SMEP, and they match).
> >
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >  arch/x86/kvm/mmu.c | 121 +++++++++++++++++++++++++++++++----------------------
> >  1 file changed, 70 insertions(+), 51 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > index f47cccace1a1..2a8a6e3e2a31 100644
> > --- a/arch/x86/kvm/mmu.c
> > +++ b/arch/x86/kvm/mmu.c
> > @@ -4204,66 +4204,85 @@ static inline bool boot_cpu_is_amd(void)
> >                                     boot_cpu_data.x86_phys_bits, execonly);
> >  }
> >
> > +#define BYTE_MASK(access) \
> > +       ((1 & (access) ? 2 : 0) | \
> > +        (2 & (access) ? 4 : 0) | \
> > +        (3 & (access) ? 8 : 0) | \
> > +        (4 & (access) ? 16 : 0) | \
> > +        (5 & (access) ? 32 : 0) | \
> > +        (6 & (access) ? 64 : 0) | \
> > +        (7 & (access) ? 128 : 0))
> > +
> > +
> >  static void update_permission_bitmask(struct kvm_vcpu *vcpu,
> >                                       struct kvm_mmu *mmu, bool ept)
> >  {
> > -       unsigned bit, byte, pfec;
> > -       u8 map;
> > -       bool fault, x, w, u, wf, uf, ff, smapf, cr4_smap, cr4_smep, smap = 0;
> > +       unsigned byte;
> > +
> > +       const u8 x = BYTE_MASK(ACC_EXEC_MASK);
> > +       const u8 w = BYTE_MASK(ACC_WRITE_MASK);
> > +       const u8 u = BYTE_MASK(ACC_USER_MASK);
> > +
> > +       bool cr4_smep = kvm_read_cr4_bits(vcpu, X86_CR4_SMEP) != 0;
> > +       bool cr4_smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP) != 0;
> > +       bool cr0_wp = is_write_protection(vcpu);
> >
> > -       cr4_smep = kvm_read_cr4_bits(vcpu, X86_CR4_SMEP);
> > -       cr4_smap = kvm_read_cr4_bits(vcpu, X86_CR4_SMAP);
> >         for (byte = 0; byte < ARRAY_SIZE(mmu->permissions); ++byte) {
> > -               pfec = byte << 1;
> > -               map = 0;
> > -               wf = pfec & PFERR_WRITE_MASK;
> > -               uf = pfec & PFERR_USER_MASK;
> > -               ff = pfec & PFERR_FETCH_MASK;
> > +               unsigned pfec = byte << 1;
> > +
> >                 /*
> > -                * PFERR_RSVD_MASK bit is set in PFEC if the access is not
> > -                * subject to SMAP restrictions, and cleared otherwise. The
> > -                * bit is only meaningful if the SMAP bit is set in CR4.
> > +                * Each "*f" variable has a 1 bit for each UWX value
> > +                * that causes a fault with the given PFEC.
> >                  */
> > -               smapf = !(pfec & PFERR_RSVD_MASK);
> > -               for (bit = 0; bit < 8; ++bit) {
> > -                       x = bit & ACC_EXEC_MASK;
> > -                       w = bit & ACC_WRITE_MASK;
> > -                       u = bit & ACC_USER_MASK;
> > -
> > -                       if (!ept) {
> > -                               /* Not really needed: !nx will cause pte.nx to fault */
> > -                               x |= !mmu->nx;
> > -                               /* Allow supervisor writes if !cr0.wp */
> > -                               w |= !is_write_protection(vcpu) && !uf;
> > -                               /* Disallow supervisor fetches of user code if cr4.smep */
> > -                               x &= !(cr4_smep && u && !uf);
> > -
> > -                               /*
> > -                                * SMAP:kernel-mode data accesses from user-mode
> > -                                * mappings should fault. A fault is considered
> > -                                * as a SMAP violation if all of the following
> > -                                * conditions are ture:
> > -                                *   - X86_CR4_SMAP is set in CR4
> > -                                *   - A user page is accessed
> > -                                *   - Page fault in kernel mode
> > -                                *   - if CPL = 3 or X86_EFLAGS_AC is clear
> > -                                *
> > -                                *   Here, we cover the first three conditions.
> > -                                *   The fourth is computed dynamically in
> > -                                *   permission_fault() and is in smapf.
> > -                                *
> > -                                *   Also, SMAP does not affect instruction
> > -                                *   fetches, add the !ff check here to make it
> > -                                *   clearer.
> > -                                */
> > -                               smap = cr4_smap && u && !uf && !ff;
> > -                       }
> >
> > -                       fault = (ff && !x) || (uf && !u) || (wf && !w) ||
> > -                               (smapf && smap);
> > -                       map |= fault << bit;
> > +               /* Faults from writes to non-writable pages */
> > +               u8 wf = (pfec & PFERR_WRITE_MASK) ? ~w : 0;
> > +               /* Faults from user mode accesses to supervisor pages */
> > +               u8 uf = (pfec & PFERR_USER_MASK) ? ~u : 0;

Aside: In the EPT case, I wondered why it's necessary to construct any
of the mmu->permissions where PFERR_USER_MASK is set. EPT doesn't have
a 'user' bit. Digging through the code, I see that vmx.c sets
PFERR_USER_MASK for read violations, which normal x86 page tables
don't have. I'm not sure to what effect, but it's used for something
:-)

> > +               /* Faults from fetches of non-executable pages*/
> > +               u8 ff = (pfec & PFERR_FETCH_MASK) ? ~x : 0;
> > +               /* Faults from kernel mode fetches of user pages */
> > +               u8 smepf = 0;
> > +               /* Faults from kernel mode accesses of user pages */
> > +               u8 smapf = 0;
> > +
> > +               if (!ept) {
> > +                       /* Faults from kernel mode accesses to user pages */
> > +                       u8 kf = (pfec & PFERR_USER_MASK) ? 0 : u;
> > +
> > +                       /* Not really needed: !nx will cause pte.nx to fault */
> > +                       if (!mmu->nx)
> > +                               ff = 0;
> > +
> > +                       /* Allow supervisor writes if !cr0.wp */
> > +                       if (!cr0_wp)
> > +                               wf = (pfec & PFERR_USER_MASK) ? wf : 0;
> > +
> > +                       /* Disallow supervisor fetches of user code if cr4.smep */
> > +                       if (cr4_smep)
> > +                               smepf = (pfec & PFERR_FETCH_MASK) ? kf : 0;
> > +
> > +                       /*
> > +                        * SMAP:kernel-mode data accesses from user-mode
> > +                        * mappings should fault. A fault is considered
> > +                        * as a SMAP violation if all of the following
> > +                        * conditions are ture:
> > +                        *   - X86_CR4_SMAP is set in CR4
> > +                        *   - A user page is accessed
> > +                        *   - The access is not a fetch
> > +                        *   - Page fault in kernel mode
> > +                        *   - if CPL = 3 or X86_EFLAGS_AC is clear
> > +                        *
> > +                        * Here, we cover the first three conditions.
> > +                        * The fourth is computed dynamically in permission_fault();
> > +                        * PFERR_RSVD_MASK bit will be set in PFEC if the access is
> > +                        * *not* subject to SMAP restrictions.
> > +                        */
> > +                       if (cr4_smap)
> > +                               smapf = (pfec & (PFERR_RSVD_MASK|PFERR_FETCH_MASK)) ? 0 : kf;
> >                 }
> > -               mmu->permissions[byte] = map;
> > +
> > +               mmu->permissions[byte] = ff | uf | wf | smepf | smapf;
> >         }
> >  }
> >
> > --
> > 1.8.3.1
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] KVM: MMU: speedup update_permission_bitmask
  2017-09-12 16:48   ` Peter Feiner
@ 2017-09-12 19:55     ` Paolo Bonzini
  2017-09-12 20:06       ` Peter Feiner
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2017-09-12 19:55 UTC (permalink / raw)
  To: Peter Feiner, Jim Mattson; +Cc: LKML, kvm list, David Hildenbrand

On 12/09/2017 18:48, Peter Feiner wrote:
>>>
>>> Because update_permission_bitmask is actually the top item in the profile
>>> for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand
>>> clock cycles, or up to 30%:
>
> This is a great improvement! Why not take it a step further and
> compute the whole table once at module init time and be done with it?
> There are only 5 extra input bits (nx, ept, smep, smap, wp),

4 actually, nx could be ignored (because unlike WP, the bit is reserved
when nx is disabled).  It is only handled for clarity.

> so the
> whole table would only take up (1 << 5) * 16 = 512 bytes. Moreover, if
> you had 32 VMs on the host, you'd actually save memory!

Indeed; my thought was to write a script or something to generate the
tables at compile time, but doing it at module init time would be clever
and easier.

That said, the generated code for the function, right now, is pretty
good.  If it saved 1000 clock cycles per nested vmexit it would be very
convincing, but if it were 50 or even 100 a bit less so.

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] KVM: MMU: speedup update_permission_bitmask
  2017-09-12 19:55     ` Paolo Bonzini
@ 2017-09-12 20:06       ` Peter Feiner
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Feiner @ 2017-09-12 20:06 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Jim Mattson, LKML, kvm list, David Hildenbrand

On Tue, Sep 12, 2017 at 12:55 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 12/09/2017 18:48, Peter Feiner wrote:
>>>>
>>>> Because update_permission_bitmask is actually the top item in the profile
>>>> for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand
>>>> clock cycles, or up to 30%:
>>
>> This is a great improvement! Why not take it a step further and
>> compute the whole table once at module init time and be done with it?
>> There are only 5 extra input bits (nx, ept, smep, smap, wp),
>
> 4 actually, nx could be ignored (because unlike WP, the bit is reserved
> when nx is disabled).  It is only handled for clarity.
>
>> so the
>> whole table would only take up (1 << 5) * 16 = 512 bytes. Moreover, if
>> you had 32 VMs on the host, you'd actually save memory!
>
> Indeed; my thought was to write a script or something to generate the
> tables at compile time, but doing it at module init time would be clever
> and easier.
>
> That said, the generated code for the function, right now, is pretty
> good.  If it saved 1000 clock cycles per nested vmexit it would be very
> convincing, but if it were 50 or even 100 a bit less so.

ACK. I'm good with either approach :-) Please consider this one

Reviewed-By: Peter Feiner <pfeiner@google.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-09-12 20:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-24 15:56 [PATCH] KVM: MMU: speedup update_permission_bitmask Paolo Bonzini
2017-08-28 19:42 ` Jim Mattson
2017-09-12 16:48   ` Peter Feiner
2017-09-12 19:55     ` Paolo Bonzini
2017-09-12 20:06       ` Peter Feiner
2017-08-29 16:46 ` David Hildenbrand
2017-09-12  8:20   ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.