kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] [kvm-unit-tests PATCH] x86/access: Fix intermittent test failure
@ 2021-04-09  7:55 Yang Weijiang
  2021-04-09 17:29 ` Paolo Bonzini
  0 siblings, 1 reply; 8+ messages in thread
From: Yang Weijiang @ 2021-04-09  7:55 UTC (permalink / raw)
  To: pbonzini, kvm; +Cc: Yang Weijiang

During kvm-unit-test, below failure pattern is observed, this is due to testing thread
migration + cache "lazy" flush during test, so forcely flush the cache to avoid the issue.
Pin the test app to certain physical CPU can fix the issue as well. The error report is
misleading, pke is the victim of the issue.

test user cr4.pke: FAIL: error code 5 expected 4
Dump mapping: address: 0x123400000000
------L4: 21ea007
------L3: 21eb007
------L2: 21ec000
------L1: 2000000

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 x86/access.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/x86/access.c b/x86/access.c
index 7dc9eb6..379d533 100644
--- a/x86/access.c
+++ b/x86/access.c
@@ -211,6 +211,8 @@ static unsigned set_cr4_smep(int smep)
         ptl2[2] |= PT_USER_MASK;
     if (!r)
         shadow_cr4 = cr4;
+
+    invlpg((void *)(ptl2[2] & ~PAGE_SIZE));
     return r;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] [kvm-unit-tests PATCH] x86/access: Fix intermittent test failure
  2021-04-09  7:55 [PATCH] [kvm-unit-tests PATCH] x86/access: Fix intermittent test failure Yang Weijiang
@ 2021-04-09 17:29 ` Paolo Bonzini
  2021-04-09 18:59   ` Sean Christopherson
  0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2021-04-09 17:29 UTC (permalink / raw)
  To: Yang Weijiang, kvm

On 09/04/21 09:55, Yang Weijiang wrote:
> During kvm-unit-test, below failure pattern is observed, this is due to testing thread
> migration + cache "lazy" flush during test, so forcely flush the cache to avoid the issue.
> Pin the test app to certain physical CPU can fix the issue as well. The error report is
> misleading, pke is the victim of the issue.
> 
> test user cr4.pke: FAIL: error code 5 expected 4
> Dump mapping: address: 0x123400000000
> ------L4: 21ea007
> ------L3: 21eb007
> ------L2: 21ec000
> ------L1: 2000000
> 
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> ---
>   x86/access.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/x86/access.c b/x86/access.c
> index 7dc9eb6..379d533 100644
> --- a/x86/access.c
> +++ b/x86/access.c
> @@ -211,6 +211,8 @@ static unsigned set_cr4_smep(int smep)
>           ptl2[2] |= PT_USER_MASK;
>       if (!r)
>           shadow_cr4 = cr4;
> +
> +    invlpg((void *)(ptl2[2] & ~PAGE_SIZE));
>       return r;
>   }
>   
> 

Applied, thanks.

Paolo


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] [kvm-unit-tests PATCH] x86/access: Fix intermittent test failure
  2021-04-09 17:29 ` Paolo Bonzini
@ 2021-04-09 18:59   ` Sean Christopherson
  2021-04-10 14:16     ` Paolo Bonzini
  2021-04-12 13:25     ` Yang Weijiang
  0 siblings, 2 replies; 8+ messages in thread
From: Sean Christopherson @ 2021-04-09 18:59 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Yang Weijiang, kvm

On Fri, Apr 09, 2021, Paolo Bonzini wrote:
> On 09/04/21 09:55, Yang Weijiang wrote:
> > During kvm-unit-test, below failure pattern is observed, this is due to testing thread
> > migration + cache "lazy" flush during test, so forcely flush the cache to avoid the issue.
> > Pin the test app to certain physical CPU can fix the issue as well. The error report is
> > misleading, pke is the victim of the issue.
> > 
> > test user cr4.pke: FAIL: error code 5 expected 4
> > Dump mapping: address: 0x123400000000
> > ------L4: 21ea007
> > ------L3: 21eb007
> > ------L2: 21ec000
> > ------L1: 2000000
> > 
> > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > ---
> >   x86/access.c | 2 ++
> >   1 file changed, 2 insertions(+)
> > 
> > diff --git a/x86/access.c b/x86/access.c
> > index 7dc9eb6..379d533 100644
> > --- a/x86/access.c
> > +++ b/x86/access.c
> > @@ -211,6 +211,8 @@ static unsigned set_cr4_smep(int smep)
> >           ptl2[2] |= PT_USER_MASK;
> >       if (!r)
> >           shadow_cr4 = cr4;
> > +
> > +    invlpg((void *)(ptl2[2] & ~PAGE_SIZE));
> >       return r;
> >   }
> > 
> 
> Applied, thanks.

Egad, I can't keep up with this new Paolo :-D


Would it also work to move the existing invlpg() into ac_test_do_access()?

diff --git a/x86/access.c b/x86/access.c
index 7dc9eb6..5f335dd 100644
--- a/x86/access.c
+++ b/x86/access.c
@@ -451,8 +451,6 @@ fault:

 static void ac_set_expected_status(ac_test_t *at)
 {
-    invlpg(at->virt);
-
     if (at->ptep)
        at->expected_pte = *at->ptep;
     at->expected_pde = *at->pdep;
@@ -658,6 +656,9 @@ static int ac_test_do_access(ac_test_t *at)

     set_cr4_smep(F(AC_CPU_CR4_SMEP));

+    /* Flush after _all_ setup is done, toggling SMEP may also modify PMDs. */
+    invlpg(at->virt);
+
     if (F(AC_ACCESS_TWICE)) {
        asm volatile (
            "mov $fixed2, %%rsi \n\t

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] [kvm-unit-tests PATCH] x86/access: Fix intermittent test failure
  2021-04-09 18:59   ` Sean Christopherson
@ 2021-04-10 14:16     ` Paolo Bonzini
  2021-04-12 13:25     ` Yang Weijiang
  1 sibling, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2021-04-10 14:16 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Yang Weijiang, kvm

On 09/04/21 20:59, Sean Christopherson wrote:
> Would it also work to move the existing invlpg() into ac_test_do_access()?

That would be better but I was afraid of the possible changes to the 
tests outside the big loop.

I'm sending a patch now.

Paolo


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] [kvm-unit-tests PATCH] x86/access: Fix intermittent test failure
  2021-04-09 18:59   ` Sean Christopherson
  2021-04-10 14:16     ` Paolo Bonzini
@ 2021-04-12 13:25     ` Yang Weijiang
  2021-04-12 18:30       ` Sean Christopherson
  1 sibling, 1 reply; 8+ messages in thread
From: Yang Weijiang @ 2021-04-12 13:25 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, Yang Weijiang, kvm

On Fri, Apr 09, 2021 at 06:59:45PM +0000, Sean Christopherson wrote:
> On Fri, Apr 09, 2021, Paolo Bonzini wrote:
> > On 09/04/21 09:55, Yang Weijiang wrote:
> > > During kvm-unit-test, below failure pattern is observed, this is due to testing thread
> > > migration + cache "lazy" flush during test, so forcely flush the cache to avoid the issue.
> > > Pin the test app to certain physical CPU can fix the issue as well. The error report is
> > > misleading, pke is the victim of the issue.
> > > 
> > > test user cr4.pke: FAIL: error code 5 expected 4
> > > Dump mapping: address: 0x123400000000
> > > ------L4: 21ea007
> > > ------L3: 21eb007
> > > ------L2: 21ec000
> > > ------L1: 2000000
> > > 
> > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > > ---
> > >   x86/access.c | 2 ++
> > >   1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/x86/access.c b/x86/access.c
> > > index 7dc9eb6..379d533 100644
> > > --- a/x86/access.c
> > > +++ b/x86/access.c
> > > @@ -211,6 +211,8 @@ static unsigned set_cr4_smep(int smep)
> > >           ptl2[2] |= PT_USER_MASK;
> > >       if (!r)
> > >           shadow_cr4 = cr4;
> > > +
> > > +    invlpg((void *)(ptl2[2] & ~PAGE_SIZE));
> > >       return r;
> > >   }
> > > 
> > 
> > Applied, thanks.
> 
> Egad, I can't keep up with this new Paolo :-D
> 
> 
> Would it also work to move the existing invlpg() into ac_test_do_access()?
>
Hi, Sean,
You patch works for the app on my side, but one thing makes my confused, my patch
invalidates the mapping for test code(ac_test_do_access), but your patch invlidates
at->virt, they're not mapped to the same page. Why it works?

I simplified the test by only executing two patterns as below:

printf("\n############# start test ############\n\n");
at.flags = 0x8000000;
ac_test_exec(&at, &pool);
at.flags = 0x200000; /* or 0x10200000 */
ac_test_exec(&at, &pool);
printf("############# end test ############\n\n");

with your patch I still got error code 5 while getting  error code 4 with my patch.
What makes it different?

> diff --git a/x86/access.c b/x86/access.c
> index 7dc9eb6..5f335dd 100644
> --- a/x86/access.c
> +++ b/x86/access.c
> @@ -451,8 +451,6 @@ fault:
> 
>  static void ac_set_expected_status(ac_test_t *at)
>  {
> -    invlpg(at->virt);
> -
>      if (at->ptep)
>         at->expected_pte = *at->ptep;
>      at->expected_pde = *at->pdep;
> @@ -658,6 +656,9 @@ static int ac_test_do_access(ac_test_t *at)
> 
>      set_cr4_smep(F(AC_CPU_CR4_SMEP));
> 
> +    /* Flush after _all_ setup is done, toggling SMEP may also modify PMDs. */
> +    invlpg(at->virt);
> +
>      if (F(AC_ACCESS_TWICE)) {
>         asm volatile (
>             "mov $fixed2, %%rsi \n\t

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] [kvm-unit-tests PATCH] x86/access: Fix intermittent test failure
  2021-04-12 13:25     ` Yang Weijiang
@ 2021-04-12 18:30       ` Sean Christopherson
  2021-04-13  3:13         ` Yang Weijiang
  0 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2021-04-12 18:30 UTC (permalink / raw)
  To: Yang Weijiang; +Cc: Paolo Bonzini, kvm

On Mon, Apr 12, 2021, Yang Weijiang wrote:
> On Fri, Apr 09, 2021 at 06:59:45PM +0000, Sean Christopherson wrote:
> > On Fri, Apr 09, 2021, Paolo Bonzini wrote:
> > > On 09/04/21 09:55, Yang Weijiang wrote:
> > > > During kvm-unit-test, below failure pattern is observed, this is due to testing thread
> > > > migration + cache "lazy" flush during test, so forcely flush the cache to avoid the issue.
> > > > Pin the test app to certain physical CPU can fix the issue as well. The error report is
> > > > misleading, pke is the victim of the issue.
> > > > 
> > > > test user cr4.pke: FAIL: error code 5 expected 4
> > > > Dump mapping: address: 0x123400000000
> > > > ------L4: 21ea007
> > > > ------L3: 21eb007
> > > > ------L2: 21ec000
> > > > ------L1: 2000000
> > > > 
> > > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > > > ---
> > > >   x86/access.c | 2 ++
> > > >   1 file changed, 2 insertions(+)
> > > > 
> > > > diff --git a/x86/access.c b/x86/access.c
> > > > index 7dc9eb6..379d533 100644
> > > > --- a/x86/access.c
> > > > +++ b/x86/access.c
> > > > @@ -211,6 +211,8 @@ static unsigned set_cr4_smep(int smep)
> > > >           ptl2[2] |= PT_USER_MASK;
> > > >       if (!r)
> > > >           shadow_cr4 = cr4;
> > > > +
> > > > +    invlpg((void *)(ptl2[2] & ~PAGE_SIZE));
> > > >       return r;
> > > >   }
> > > > 
> > > 
> > > Applied, thanks.
> > 
> > Egad, I can't keep up with this new Paolo :-D
> > 
> > 
> > Would it also work to move the existing invlpg() into ac_test_do_access()?
> >
> Hi, Sean,
> You patch works for the app on my side, but one thing makes my confused, my patch
> invalidates the mapping for test code(ac_test_do_access), but your patch invlidates
> at->virt, they're not mapped to the same page. Why it works?

I don't know why your patch works.  Best guess is that INVLPG on the PMD is
causing the CPU to flush the entire TLB, i.e. the problematic entry is collateral
damage.

> I simplified the test by only executing two patterns as below:
> 
> printf("\n############# start test ############\n\n");
> at.flags = 0x8000000;
> ac_test_exec(&at, &pool);
> at.flags = 0x200000; /* or 0x10200000 */
> ac_test_exec(&at, &pool);
> printf("############# end test ############\n\n");
> 
> with your patch I still got error code 5 while getting  error code 4 with my patch.
> What makes it different?

Now I'm really confused.  This:

  at.flags = 0x8000000;                                                         
  ac_test_exec(&at, &pool);

runs the test with a not-present PTE.  I don't understand how you are getting
error code '5' (USER + PRESENT) when running the user test; there shouldn't be
anything for at->virt in the TLB.

Are there tests being run before this point?  Even then, explicitly flushing
at->virt should work.  The fact that set_cr4_smep() modifies a PMD and not the
leaf PTE should be irrelevant.  Per the SDM:

  INVLPG also invalidates all entries in all paging-structure caches associated
  with the current PCID, regardless of the linear addresses to which they correspond.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] [kvm-unit-tests PATCH] x86/access: Fix intermittent test failure
  2021-04-12 18:30       ` Sean Christopherson
@ 2021-04-13  3:13         ` Yang Weijiang
  2021-04-13  4:44           ` Yang Weijiang
  0 siblings, 1 reply; 8+ messages in thread
From: Yang Weijiang @ 2021-04-13  3:13 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Yang Weijiang, Paolo Bonzini, kvm

[-- Attachment #1: Type: text/plain, Size: 3727 bytes --]

On Mon, Apr 12, 2021 at 06:30:58PM +0000, Sean Christopherson wrote:
> On Mon, Apr 12, 2021, Yang Weijiang wrote:
> > On Fri, Apr 09, 2021 at 06:59:45PM +0000, Sean Christopherson wrote:
> > > On Fri, Apr 09, 2021, Paolo Bonzini wrote:
> > > > On 09/04/21 09:55, Yang Weijiang wrote:
> > > > > During kvm-unit-test, below failure pattern is observed, this is due to testing thread
> > > > > migration + cache "lazy" flush during test, so forcely flush the cache to avoid the issue.
> > > > > Pin the test app to certain physical CPU can fix the issue as well. The error report is
> > > > > misleading, pke is the victim of the issue.
> > > > > 
> > > > > test user cr4.pke: FAIL: error code 5 expected 4
> > > > > Dump mapping: address: 0x123400000000
> > > > > ------L4: 21ea007
> > > > > ------L3: 21eb007
> > > > > ------L2: 21ec000
> > > > > ------L1: 2000000
> > > > > 
> > > > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > > > > ---
> > > > >   x86/access.c | 2 ++
> > > > >   1 file changed, 2 insertions(+)
> > > > > 
> > > > > diff --git a/x86/access.c b/x86/access.c
> > > > > index 7dc9eb6..379d533 100644
> > > > > --- a/x86/access.c
> > > > > +++ b/x86/access.c
> > > > > @@ -211,6 +211,8 @@ static unsigned set_cr4_smep(int smep)
> > > > >           ptl2[2] |= PT_USER_MASK;
> > > > >       if (!r)
> > > > >           shadow_cr4 = cr4;
> > > > > +
> > > > > +    invlpg((void *)(ptl2[2] & ~PAGE_SIZE));
> > > > >       return r;
> > > > >   }
> > > > > 
> > > > 
> > > > Applied, thanks.
> > > 
> > > Egad, I can't keep up with this new Paolo :-D
> > > 
> > > 
> > > Would it also work to move the existing invlpg() into ac_test_do_access()?
> > >
> > Hi, Sean,
> > You patch works for the app on my side, but one thing makes my confused, my patch
> > invalidates the mapping for test code(ac_test_do_access), but your patch invlidates
> > at->virt, they're not mapped to the same page. Why it works?
> 
> I don't know why your patch works.  Best guess is that INVLPG on the PMD is
> causing the CPU to flush the entire TLB, i.e. the problematic entry is collateral
> damage.
Hi, Sean,
Maybe you've figured out the root cause, just attached the patch to let you know what I did
for comparison, toggling two different invlpg() got significant different results.

> 
> > I simplified the test by only executing two patterns as below:
> > 
> > printf("\n############# start test ############\n\n");
> > at.flags = 0x8000000;
> > ac_test_exec(&at, &pool);
> > at.flags = 0x200000; /* or 0x10200000 */
> > ac_test_exec(&at, &pool);
> > printf("############# end test ############\n\n");
> > 
> > with your patch I still got error code 5 while getting  error code 4 with my patch.
> > What makes it different?
> 
> Now I'm really confused.  This:
> 
>   at.flags = 0x8000000;                                                         
>   ac_test_exec(&at, &pool);
> 
> runs the test with a not-present PTE.  I don't understand how you are getting
> error code '5' (USER + PRESENT) when running the user test; there shouldn't be
> anything for at->virt in the TLB.
error code '5' is (USER + PG protection violation) :-)

> 
> Are there tests being run before this point?
What I changed is included in the patch.

> Even then, explicitly flushing
> at->virt should work.  The fact that set_cr4_smep() modifies a PMD and not the
> leaf PTE should be irrelevant.  Per the SDM:
> 
>   INVLPG also invalidates all entries in all paging-structure caches associated
>   with the current PCID, regardless of the linear addresses to which they correspond.
Even PCID is not enabled in CR4? If it's the case, it cannot tell why two
different flavor of invalidation makes different results.

Thanks for the comments!

[-- Attachment #2: access.patch --]
[-- Type: text/plain, Size: 2006 bytes --]

diff --git a/x86/access.c b/x86/access.c
index 7dc9eb6..c7bb009 100644
--- a/x86/access.c
+++ b/x86/access.c
@@ -638,6 +638,7 @@ static int ac_test_do_access(ac_test_t *at)
     unsigned long rsp;
     _Bool success = true;
     int flags = at->flags;
+    extern u64 ptl2[];
 
     ++unique;
     if (!(unique & 65535)) {
@@ -670,6 +671,8 @@ static int ac_test_do_access(ac_test_t *at)
 	fault = 0;
     }
 
+    //invlpg((void *)(ptl2[2] & ~PAGE_SIZE));
+    invlpg(at->virt);
     asm volatile ("mov $fixed1, %%rsi \n\t"
 		  "mov %%rsp, %%rdx \n\t"
 		  "cmp $0, %[user] \n\t"
@@ -740,6 +743,8 @@ static int ac_test_do_access(ac_test_t *at)
             printf("PASS\n");
 	}
     }
+    printf("cr0 = 0x%lx, cr3 = 0x%lx, cr4 = 0x%lx\n", read_cr0(), read_cr3(), read_cr4());
+    printf("at->flags = 0x%x, error = 0x%x, fault = 0x%x\n\n", at->flags, e, fault);
     return success;
 }
 
@@ -955,7 +960,7 @@ static int ac_test_run(void)
 {
     ac_test_t at;
     ac_pool_t pool;
-    int i, tests, successes;
+    int tests, successes;
 
     printf("run\n");
     tests = successes = 0;
@@ -1018,7 +1023,17 @@ static int ac_test_run(void)
 
     ac_env_int(&pool);
     ac_test_init(&at, (void *)(0x123400000000 + 16 * smp_id()));
-    do {
+//////////////////////////////////////////////////////////////
+    printf("\n############# start test ############\n\n");
+    at.flags = 0x8000000 - 1;
+    ac_test_bump(&at);
+    ac_test_exec(&at, &pool);
+    at.flags = 0x200000 - 1;
+    ac_test_bump(&at);
+    ac_test_exec(&at, &pool);
+    printf("\n############# end test ############\n\n");
+//////////////////////////////////////////////////////////////
+    /*do {
 	++tests;
 	successes += ac_test_exec(&at, &pool);
     } while (ac_test_bump(&at));
@@ -1026,7 +1041,7 @@ static int ac_test_run(void)
     for (i = 0; i < ARRAY_SIZE(ac_test_cases); i++) {
 	++tests;
 	successes += ac_test_cases[i](&pool);
-    }
+    }*/
 
     printf("\n%d tests, %d failures\n", tests, tests - successes);
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] [kvm-unit-tests PATCH] x86/access: Fix intermittent test failure
  2021-04-13  3:13         ` Yang Weijiang
@ 2021-04-13  4:44           ` Yang Weijiang
  0 siblings, 0 replies; 8+ messages in thread
From: Yang Weijiang @ 2021-04-13  4:44 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Yang Weijiang, Paolo Bonzini, kvm

[-- Attachment #1: Type: text/plain, Size: 6236 bytes --]

Sorry, attached wrong patch in previous email, my original purpose is to flush the mapping
for the execution function.

On Tue, Apr 13, 2021 at 11:13:51AM +0800, Yang Weijiang wrote:
> On Mon, Apr 12, 2021 at 06:30:58PM +0000, Sean Christopherson wrote:
> > On Mon, Apr 12, 2021, Yang Weijiang wrote:
> > > On Fri, Apr 09, 2021 at 06:59:45PM +0000, Sean Christopherson wrote:
> > > > On Fri, Apr 09, 2021, Paolo Bonzini wrote:
> > > > > On 09/04/21 09:55, Yang Weijiang wrote:
> > > > > > During kvm-unit-test, below failure pattern is observed, this is due to testing thread
> > > > > > migration + cache "lazy" flush during test, so forcely flush the cache to avoid the issue.
> > > > > > Pin the test app to certain physical CPU can fix the issue as well. The error report is
> > > > > > misleading, pke is the victim of the issue.
> > > > > > 
> > > > > > test user cr4.pke: FAIL: error code 5 expected 4
> > > > > > Dump mapping: address: 0x123400000000
> > > > > > ------L4: 21ea007
> > > > > > ------L3: 21eb007
> > > > > > ------L2: 21ec000
> > > > > > ------L1: 2000000
> > > > > > 
> > > > > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > > > > > ---
> > > > > >   x86/access.c | 2 ++
> > > > > >   1 file changed, 2 insertions(+)
> > > > > > 
> > > > > > diff --git a/x86/access.c b/x86/access.c
> > > > > > index 7dc9eb6..379d533 100644
> > > > > > --- a/x86/access.c
> > > > > > +++ b/x86/access.c
> > > > > > @@ -211,6 +211,8 @@ static unsigned set_cr4_smep(int smep)
> > > > > >           ptl2[2] |= PT_USER_MASK;
> > > > > >       if (!r)
> > > > > >           shadow_cr4 = cr4;
> > > > > > +
> > > > > > +    invlpg((void *)(ptl2[2] & ~PAGE_SIZE));
> > > > > >       return r;
> > > > > >   }
> > > > > > 
> > > > > 
> > > > > Applied, thanks.
> > > > 
> > > > Egad, I can't keep up with this new Paolo :-D
> > > > 
> > > > 
> > > > Would it also work to move the existing invlpg() into ac_test_do_access()?
> > > >
> > > Hi, Sean,
> > > You patch works for the app on my side, but one thing makes my confused, my patch
> > > invalidates the mapping for test code(ac_test_do_access), but your patch invlidates
> > > at->virt, they're not mapped to the same page. Why it works?
> > 
> > I don't know why your patch works.  Best guess is that INVLPG on the PMD is
> > causing the CPU to flush the entire TLB, i.e. the problematic entry is collateral
> > damage.
> Hi, Sean,
> Maybe you've figured out the root cause, just attached the patch to let you know what I did
> for comparison, toggling two different invlpg() got significant different results.
> 
> > 
> > > I simplified the test by only executing two patterns as below:
> > > 
> > > printf("\n############# start test ############\n\n");
> > > at.flags = 0x8000000;
> > > ac_test_exec(&at, &pool);
> > > at.flags = 0x200000; /* or 0x10200000 */
> > > ac_test_exec(&at, &pool);
> > > printf("############# end test ############\n\n");
> > > 
> > > with your patch I still got error code 5 while getting  error code 4 with my patch.
> > > What makes it different?
> > 
> > Now I'm really confused.  This:
> > 
> >   at.flags = 0x8000000;                                                         
> >   ac_test_exec(&at, &pool);
> > 
> > runs the test with a not-present PTE.  I don't understand how you are getting
> > error code '5' (USER + PRESENT) when running the user test; there shouldn't be
> > anything for at->virt in the TLB.
> error code '5' is (USER + PG protection violation) :-)
> 
> > 
> > Are there tests being run before this point?
> What I changed is included in the patch.
> 
> > Even then, explicitly flushing
> > at->virt should work.  The fact that set_cr4_smep() modifies a PMD and not the
> > leaf PTE should be irrelevant.  Per the SDM:
> > 
> >   INVLPG also invalidates all entries in all paging-structure caches associated
> >   with the current PCID, regardless of the linear addresses to which they correspond.
> Even PCID is not enabled in CR4? If it's the case, it cannot tell why two
> different flavor of invalidation makes different results.
> 
> Thanks for the comments!

> diff --git a/x86/access.c b/x86/access.c
> index 7dc9eb6..c7bb009 100644
> --- a/x86/access.c
> +++ b/x86/access.c
> @@ -638,6 +638,7 @@ static int ac_test_do_access(ac_test_t *at)
>      unsigned long rsp;
>      _Bool success = true;
>      int flags = at->flags;
> +    extern u64 ptl2[];
>  
>      ++unique;
>      if (!(unique & 65535)) {
> @@ -670,6 +671,8 @@ static int ac_test_do_access(ac_test_t *at)
>  	fault = 0;
>      }
>  
> +    //invlpg((void *)(ptl2[2] & ~PAGE_SIZE));
> +    invlpg(at->virt);
>      asm volatile ("mov $fixed1, %%rsi \n\t"
>  		  "mov %%rsp, %%rdx \n\t"
>  		  "cmp $0, %[user] \n\t"
> @@ -740,6 +743,8 @@ static int ac_test_do_access(ac_test_t *at)
>              printf("PASS\n");
>  	}
>      }
> +    printf("cr0 = 0x%lx, cr3 = 0x%lx, cr4 = 0x%lx\n", read_cr0(), read_cr3(), read_cr4());
> +    printf("at->flags = 0x%x, error = 0x%x, fault = 0x%x\n\n", at->flags, e, fault);
>      return success;
>  }
>  
> @@ -955,7 +960,7 @@ static int ac_test_run(void)
>  {
>      ac_test_t at;
>      ac_pool_t pool;
> -    int i, tests, successes;
> +    int tests, successes;
>  
>      printf("run\n");
>      tests = successes = 0;
> @@ -1018,7 +1023,17 @@ static int ac_test_run(void)
>  
>      ac_env_int(&pool);
>      ac_test_init(&at, (void *)(0x123400000000 + 16 * smp_id()));
> -    do {
> +//////////////////////////////////////////////////////////////
> +    printf("\n############# start test ############\n\n");
> +    at.flags = 0x8000000 - 1;
> +    ac_test_bump(&at);
> +    ac_test_exec(&at, &pool);
> +    at.flags = 0x200000 - 1;
> +    ac_test_bump(&at);
> +    ac_test_exec(&at, &pool);
> +    printf("\n############# end test ############\n\n");
> +//////////////////////////////////////////////////////////////
> +    /*do {
>  	++tests;
>  	successes += ac_test_exec(&at, &pool);
>      } while (ac_test_bump(&at));
> @@ -1026,7 +1041,7 @@ static int ac_test_run(void)
>      for (i = 0; i < ARRAY_SIZE(ac_test_cases); i++) {
>  	++tests;
>  	successes += ac_test_cases[i](&pool);
> -    }
> +    }*/
>  
>      printf("\n%d tests, %d failures\n", tests, tests - successes);
>  


[-- Attachment #2: access.patch --]
[-- Type: text/plain, Size: 1822 bytes --]

diff --git a/x86/access.c b/x86/access.c
index 7dc9eb6..87985fe 100644
--- a/x86/access.c
+++ b/x86/access.c
@@ -670,6 +670,8 @@ static int ac_test_do_access(ac_test_t *at)
 	fault = 0;
     }
 
+    invlpg(ac_test_do_access);
+    //invlpg(at->virt);
     asm volatile ("mov $fixed1, %%rsi \n\t"
 		  "mov %%rsp, %%rdx \n\t"
 		  "cmp $0, %[user] \n\t"
@@ -740,6 +742,8 @@ static int ac_test_do_access(ac_test_t *at)
             printf("PASS\n");
 	}
     }
+    printf("cr0 = 0x%lx, cr3 = 0x%lx, cr4 = 0x%lx\n", read_cr0(), read_cr3(), read_cr4());
+    printf("ac_test_do_access = %p, at->flags = 0x%x, error = 0x%x, fault = 0x%x\n\n", ac_test_do_access, at->flags, e, fault);
     return success;
 }
 
@@ -955,7 +959,7 @@ static int ac_test_run(void)
 {
     ac_test_t at;
     ac_pool_t pool;
-    int i, tests, successes;
+    int tests, successes;
 
     printf("run\n");
     tests = successes = 0;
@@ -1018,7 +1022,17 @@ static int ac_test_run(void)
 
     ac_env_int(&pool);
     ac_test_init(&at, (void *)(0x123400000000 + 16 * smp_id()));
-    do {
+//////////////////////////////////////////////////////////////
+    printf("\n############# start test ############\n\n");
+    at.flags = 0x8000000 - 1;
+    ac_test_bump(&at);
+    ac_test_exec(&at, &pool);
+    at.flags = 0x200000 - 1;
+    ac_test_bump(&at);
+    ac_test_exec(&at, &pool);
+    printf("\n############# end test ############\n\n");
+//////////////////////////////////////////////////////////////
+    /*do {
 	++tests;
 	successes += ac_test_exec(&at, &pool);
     } while (ac_test_bump(&at));
@@ -1026,7 +1040,7 @@ static int ac_test_run(void)
     for (i = 0; i < ARRAY_SIZE(ac_test_cases); i++) {
 	++tests;
 	successes += ac_test_cases[i](&pool);
-    }
+    }*/
 
     printf("\n%d tests, %d failures\n", tests, tests - successes);
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-04-13  4:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-09  7:55 [PATCH] [kvm-unit-tests PATCH] x86/access: Fix intermittent test failure Yang Weijiang
2021-04-09 17:29 ` Paolo Bonzini
2021-04-09 18:59   ` Sean Christopherson
2021-04-10 14:16     ` Paolo Bonzini
2021-04-12 13:25     ` Yang Weijiang
2021-04-12 18:30       ` Sean Christopherson
2021-04-13  3:13         ` Yang Weijiang
2021-04-13  4:44           ` Yang Weijiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).