All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yang Weijiang <weijiang.yang@intel.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Yang Weijiang <weijiang.yang@intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org
Subject: Re: [PATCH] [kvm-unit-tests PATCH] x86/access: Fix intermittent test failure
Date: Tue, 13 Apr 2021 11:13:51 +0800	[thread overview]
Message-ID: <20210413031351.GA22945@local-michael-cet-test.sh.intel.com> (raw)
In-Reply-To: <YHSR4lWFupf6m9sv@google.com>

[-- Attachment #1: Type: text/plain, Size: 3727 bytes --]

On Mon, Apr 12, 2021 at 06:30:58PM +0000, Sean Christopherson wrote:
> On Mon, Apr 12, 2021, Yang Weijiang wrote:
> > On Fri, Apr 09, 2021 at 06:59:45PM +0000, Sean Christopherson wrote:
> > > On Fri, Apr 09, 2021, Paolo Bonzini wrote:
> > > > On 09/04/21 09:55, Yang Weijiang wrote:
> > > > > During kvm-unit-test, below failure pattern is observed, this is due to testing thread
> > > > > migration + cache "lazy" flush during test, so forcely flush the cache to avoid the issue.
> > > > > Pin the test app to certain physical CPU can fix the issue as well. The error report is
> > > > > misleading, pke is the victim of the issue.
> > > > > 
> > > > > test user cr4.pke: FAIL: error code 5 expected 4
> > > > > Dump mapping: address: 0x123400000000
> > > > > ------L4: 21ea007
> > > > > ------L3: 21eb007
> > > > > ------L2: 21ec000
> > > > > ------L1: 2000000
> > > > > 
> > > > > Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> > > > > ---
> > > > >   x86/access.c | 2 ++
> > > > >   1 file changed, 2 insertions(+)
> > > > > 
> > > > > diff --git a/x86/access.c b/x86/access.c
> > > > > index 7dc9eb6..379d533 100644
> > > > > --- a/x86/access.c
> > > > > +++ b/x86/access.c
> > > > > @@ -211,6 +211,8 @@ static unsigned set_cr4_smep(int smep)
> > > > >           ptl2[2] |= PT_USER_MASK;
> > > > >       if (!r)
> > > > >           shadow_cr4 = cr4;
> > > > > +
> > > > > +    invlpg((void *)(ptl2[2] & ~PAGE_SIZE));
> > > > >       return r;
> > > > >   }
> > > > > 
> > > > 
> > > > Applied, thanks.
> > > 
> > > Egad, I can't keep up with this new Paolo :-D
> > > 
> > > 
> > > Would it also work to move the existing invlpg() into ac_test_do_access()?
> > >
> > Hi, Sean,
> > You patch works for the app on my side, but one thing makes my confused, my patch
> > invalidates the mapping for test code(ac_test_do_access), but your patch invlidates
> > at->virt, they're not mapped to the same page. Why it works?
> 
> I don't know why your patch works.  Best guess is that INVLPG on the PMD is
> causing the CPU to flush the entire TLB, i.e. the problematic entry is collateral
> damage.
Hi, Sean,
Maybe you've figured out the root cause, just attached the patch to let you know what I did
for comparison, toggling two different invlpg() got significant different results.

> 
> > I simplified the test by only executing two patterns as below:
> > 
> > printf("\n############# start test ############\n\n");
> > at.flags = 0x8000000;
> > ac_test_exec(&at, &pool);
> > at.flags = 0x200000; /* or 0x10200000 */
> > ac_test_exec(&at, &pool);
> > printf("############# end test ############\n\n");
> > 
> > with your patch I still got error code 5 while getting  error code 4 with my patch.
> > What makes it different?
> 
> Now I'm really confused.  This:
> 
>   at.flags = 0x8000000;                                                         
>   ac_test_exec(&at, &pool);
> 
> runs the test with a not-present PTE.  I don't understand how you are getting
> error code '5' (USER + PRESENT) when running the user test; there shouldn't be
> anything for at->virt in the TLB.
error code '5' is (USER + PG protection violation) :-)

> 
> Are there tests being run before this point?
What I changed is included in the patch.

> Even then, explicitly flushing
> at->virt should work.  The fact that set_cr4_smep() modifies a PMD and not the
> leaf PTE should be irrelevant.  Per the SDM:
> 
>   INVLPG also invalidates all entries in all paging-structure caches associated
>   with the current PCID, regardless of the linear addresses to which they correspond.
Even PCID is not enabled in CR4? If it's the case, it cannot tell why two
different flavor of invalidation makes different results.

Thanks for the comments!

[-- Attachment #2: access.patch --]
[-- Type: text/plain, Size: 2006 bytes --]

diff --git a/x86/access.c b/x86/access.c
index 7dc9eb6..c7bb009 100644
--- a/x86/access.c
+++ b/x86/access.c
@@ -638,6 +638,7 @@ static int ac_test_do_access(ac_test_t *at)
     unsigned long rsp;
     _Bool success = true;
     int flags = at->flags;
+    extern u64 ptl2[];
 
     ++unique;
     if (!(unique & 65535)) {
@@ -670,6 +671,8 @@ static int ac_test_do_access(ac_test_t *at)
 	fault = 0;
     }
 
+    //invlpg((void *)(ptl2[2] & ~PAGE_SIZE));
+    invlpg(at->virt);
     asm volatile ("mov $fixed1, %%rsi \n\t"
 		  "mov %%rsp, %%rdx \n\t"
 		  "cmp $0, %[user] \n\t"
@@ -740,6 +743,8 @@ static int ac_test_do_access(ac_test_t *at)
             printf("PASS\n");
 	}
     }
+    printf("cr0 = 0x%lx, cr3 = 0x%lx, cr4 = 0x%lx\n", read_cr0(), read_cr3(), read_cr4());
+    printf("at->flags = 0x%x, error = 0x%x, fault = 0x%x\n\n", at->flags, e, fault);
     return success;
 }
 
@@ -955,7 +960,7 @@ static int ac_test_run(void)
 {
     ac_test_t at;
     ac_pool_t pool;
-    int i, tests, successes;
+    int tests, successes;
 
     printf("run\n");
     tests = successes = 0;
@@ -1018,7 +1023,17 @@ static int ac_test_run(void)
 
     ac_env_int(&pool);
     ac_test_init(&at, (void *)(0x123400000000 + 16 * smp_id()));
-    do {
+//////////////////////////////////////////////////////////////
+    printf("\n############# start test ############\n\n");
+    at.flags = 0x8000000 - 1;
+    ac_test_bump(&at);
+    ac_test_exec(&at, &pool);
+    at.flags = 0x200000 - 1;
+    ac_test_bump(&at);
+    ac_test_exec(&at, &pool);
+    printf("\n############# end test ############\n\n");
+//////////////////////////////////////////////////////////////
+    /*do {
 	++tests;
 	successes += ac_test_exec(&at, &pool);
     } while (ac_test_bump(&at));
@@ -1026,7 +1041,7 @@ static int ac_test_run(void)
     for (i = 0; i < ARRAY_SIZE(ac_test_cases); i++) {
 	++tests;
 	successes += ac_test_cases[i](&pool);
-    }
+    }*/
 
     printf("\n%d tests, %d failures\n", tests, tests - successes);
 

  reply	other threads:[~2021-04-13  3:01 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-09  7:55 [PATCH] [kvm-unit-tests PATCH] x86/access: Fix intermittent test failure Yang Weijiang
2021-04-09 17:29 ` Paolo Bonzini
2021-04-09 18:59   ` Sean Christopherson
2021-04-10 14:16     ` Paolo Bonzini
2021-04-12 13:25     ` Yang Weijiang
2021-04-12 18:30       ` Sean Christopherson
2021-04-13  3:13         ` Yang Weijiang [this message]
2021-04-13  4:44           ` Yang Weijiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210413031351.GA22945@local-michael-cet-test.sh.intel.com \
    --to=weijiang.yang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.