linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brijesh Singh <brijesh.singh@amd.com>
To: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: <brijesh.singh@amd.com>, <pbonzini@redhat.com>, <joro@8bytes.org>,
	<kvm@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<tglx@linutronix.de>, <mingo@redhat.com>, <hpa@zytor.com>,
	<x86@kernel.org>, <Thomas.Lendacky@amd.com>
Subject: Re: [PATCH] x86: kvm: Avoid guest page table walk when gpa_available is set
Date: Tue, 25 Apr 2017 17:02:58 -0500	[thread overview]
Message-ID: <6e453f35-cd26-1df8-5f8e-68fa09c6a1a3@amd.com> (raw)
In-Reply-To: <20170425140351.GF5713@potion>


>>
>> I also wanted to avoid adding yet another variable but we can't depend on
>> cr2 parameters passed into x86_emulate_instruction().
>>
>> The x86_emulate_instruction() function is called from two places:
>>
>> 1) handling the page-fault.
>> pf_interception [svm.c]
>>  kvm_mmu_page_fault [mmu.c]
>>   x86_emulate_instruction [x86.c]
>>
>> 2) completing the IO/MMIO's from previous instruction decode
>> kvm_arch_vcpu_ioctl_run
>>  complete_emulated_io
>>    emulate_instruction
>>     x86_emulate_instruction(vcpu, 0, emulation_type, NULL, 0)
>>
>> In #1, we are guaranteed that cr2 variable will contain a valid GPA but
>> in #2, CR2 is set to zero.
>
> We are setting up the completion in #1 x86_emulate_instruction(), where
> the gpa (cr2) is available, so we could store the value while arming
> vcpu->arch.complete_userspace_io.
>
> emulator_read_write_onepage() already saves gpa in frag->gpa, which is
> then passed into complete_emulated_mmio -- isn't that mechanism
> sufficient?
>

I see that complete_emulated_mmio() saves the frag>gpa into run->mmio.phys_addr,
so based on the exit_reason we should be able to get the saved gpa.
In my debug patch below, I tried doing something similar to verify that frag->gpa
contains the valid CR2 value but I saw a bunch of mismatch. So it seems like we
may not able to use frag->gpa mechanism. Additionally we also need to handle the
PIO cases (e.g what if we are called from complete_emulated_pio), which also takes
similar code path

complete_emulated_pio
  completed_emulated_io
    emulate_instruction
     x86_emulate_instruction(vcpu, 0, emulation_type, NULL, 0)

@@ -5682,13 +5686,20 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
  
  restart:
         /*
-        * Save the faulting GPA (cr2) in the address field
-        * NOTE: If gpa_available is set then gpa_val will contain a valid GPA
+        * if previous exit was due to userspace mmio completion then actual
+        * cr2 is stored in mmio.phys_addr.
          */
-       if (vcpu->arch.gpa_available)
-               ctxt->exception.address = vcpu->arch.gpa_val;
-       else
-               ctxt->exception.address = cr2;
+       if (vcpu->run->exit_reason == KVM_EXIT_MMIO) {
+               cr2 = vcpu->run->mmio.phys_addr;
+               if (cr2 != vcpu->arch.gpa_val)
+                       pr_err("** mismatch %llx %llx\n",
+                               vcpu->run->mmio.phys_addr, vcpu->arch.gpa_val);
+       }
+
+        /* Save the faulting GPA (cr2) in the address field */
+       ctxt->exception.address = cr2;


>>
>> handle_exit [svm.c]
>>  pf_interception  [svm.c]
>> /* it invokes the fault handler with CR2 = svm->vmcb->control.exit_info_2 */
>>   kvm_mmu_page_fault [mmu.c]
>>    x86_emulate_instruction  [x86.c]
>>     emulator_read_write_onepage  [x86.c]
>>      /*
>>       *this is where we walk the guest page table to translate
>>       * a GVA to GPA. If gpa_available is set then we use the
>>       * gpa_val instead of walking the pgtable.
>>       */
>
> pf_interception is the NPF exit handler -- please move the setting
> there, at least.  handle_exit() is a hot path that shouldn't contain
> code that isn't applicable to all exits.
>

Sure, Will do.

> Btw. we need some other guarantees to declare it as GPA (cr2 is GPA in
> NPT exits, but might not be in other) ... isn't arch.mmu.direct_map a
> condition we are interested in?
>
> The other code uses it to interpret cr2 directly as gpa, so we might be
> able to avoid setting the arch.gpa_available in a hot path too.
>

Hmm looking at the call trace I am not sure how arch.mmu_direct_map will help
but I will investigate a bit more.

>>
>> See my previous comment. In some cases CR2 may be set to zero
>> (e.g when completing the instruction from previous io/mmio page-fault).
>>
>> If we are decide to add the gpa_val then we can remove above if
>> statement from x86_emulate_instruction() and update emulator_read_write_onepage
>> to use the vcpu->arch.gpa_val instead of exception->address.
>
> Yeah, that would be nicer than setting exception->address at a random
> place.
>
> We could also pass the value as cr2 in all cases if it made something
> better.
>
>> if (vcpu->arch.gpa_available &&
>>      emulator_can_use_gpa(ctxt) &&
>>      (addr & ~PAGE_MASK) == (exception->address & ~PAGE_MASK)) {
>>       gpa = vcpu=>arch.gpa_val;
>>       ...
>>       ...
>>    }
>
> If at all possible, I'd like to have the gpa passed with other relevant
> data, instead of having it isolated like this ... and we can't manage
> that, then at least good benchmark results to excuse the bad code.
>

I ran two tests to see how many times we walk  guest page table.

Test1: run kvm-unit-test
Test2: launch Ubuntu 16.06 guest, run a stressapptest for 20 seconds, shutdown the VM

Before patch
   * Test1: 10419
   * Test2: 243365

After patch:
   * Test1:  1259
   * Test2:  1221

Please let me know if you want me to run other other benchmark and capture the data.

-Brijesh

  reply	other threads:[~2017-04-25 22:03 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-24 15:52 [PATCH] x86: kvm: Avoid guest page table walk when gpa_available is set Brijesh Singh
2017-04-24 16:16 ` Brijesh Singh
2017-04-24 20:52 ` Radim Krčmář
2017-04-24 22:14   ` Brijesh Singh
2017-04-25 14:03     ` Radim Krčmář
2017-04-25 22:02       ` Brijesh Singh [this message]
2017-04-26 20:44         ` Radim Krčmář
2017-04-28 19:15           ` Brijesh Singh
2017-05-02 16:24             ` Paolo Bonzini
2017-05-02 21:44               ` Brijesh Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6e453f35-cd26-1df8-5f8e-68fa09c6a1a3@amd.com \
    --to=brijesh.singh@amd.com \
    --cc=Thomas.Lendacky@amd.com \
    --cc=hpa@zytor.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).