From: Alexander Graf <agraf@suse.de>
To: kvm-ppc@vger.kernel.org
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
KVM list <kvm@vger.kernel.org>
Subject: [PATCH 27/27] KVM: PPC: Add Documentation about PV interface
Date: Thu, 1 Jul 2010 12:43:02 +0200 [thread overview]
Message-ID: <1277980982-12433-28-git-send-email-agraf@suse.de> (raw)
In-Reply-To: <1277980982-12433-1-git-send-email-agraf@suse.de>
We just introduced a new PV interface that screams for documentation. So here
it is - a shiny new and awesome text file describing the internal works of
the PPC KVM paravirtual interface.
Signed-off-by: Alexander Graf <agraf@suse.de>
---
v1 -> v2:
- clarify guest implementation
- clarify that privileged instructions still work
- explain safe MSR bits
- Fix dsisr patch description
- change hypervisor calls to use new register values
---
Documentation/kvm/ppc-pv.txt | 185 ++++++++++++++++++++++++++++++++++++++++++
1 files changed, 185 insertions(+), 0 deletions(-)
create mode 100644 Documentation/kvm/ppc-pv.txt
diff --git a/Documentation/kvm/ppc-pv.txt b/Documentation/kvm/ppc-pv.txt
new file mode 100644
index 0000000..82de6c6
--- /dev/null
+++ b/Documentation/kvm/ppc-pv.txt
@@ -0,0 +1,185 @@
+The PPC KVM paravirtual interface
+=================================
+
+The basic execution principle by which KVM on PowerPC works is to run all kernel
+space code in PR=1 which is user space. This way we trap all privileged
+instructions and can emulate them accordingly.
+
+Unfortunately that is also the downfall. There are quite some privileged
+instructions that needlessly return us to the hypervisor even though they
+could be handled differently.
+
+This is what the PPC PV interface helps with. It takes privileged instructions
+and transforms them into unprivileged ones with some help from the hypervisor.
+This cuts down virtualization costs by about 50% on some of my benchmarks.
+
+The code for that interface can be found in arch/powerpc/kernel/kvm*
+
+Querying for existence
+======================
+
+To find out if we're running on KVM or not, we overlay the PVR register. Usually
+the PVR register contains an id that identifies your CPU type. If, however, you
+pass KVM_PVR_PARA in the register that you want the PVR result in, the register
+still contains KVM_PVR_PARA after the mfpvr call.
+
+ LOAD_REG_IMM(r5, KVM_PVR_PARA)
+ mfpvr r5
+ [r5 still contains KVM_PVR_PARA]
+
+Once determined to run under a PV capable KVM, you can now use hypercalls as
+described below.
+
+PPC hypercalls
+==============
+
+The only viable ways to reliably get from guest context to host context are:
+
+ 1) Call an invalid instruction
+ 2) Call the "sc" instruction with a parameter to "sc"
+ 3) Call the "sc" instruction with parameters in GPRs
+
+Method 1 is always a bad idea. Invalid instructions can be replaced later on
+by valid instructions, rendering the interface broken.
+
+Method 2 also has downfalls. If the parameter to "sc" is != 0 the spec is
+rather unclear if the sc is targeted directly for the hypervisor or the
+supervisor. It would also require that we read the syscall issuing instruction
+every time a syscall is issued, slowing down guest syscalls.
+
+Method 3 is what KVM uses. We pass magic constants (KVM_SC_MAGIC_R0 and
+KVM_SC_MAGIC_R3) in r0 and r3 respectively. If a syscall instruction with these
+magic values arrives from the guest's kernel mode, we take the syscall as a
+hypercall.
+
+The parameters are as follows:
+
+ r0 KVM_SC_MAGIC_R0
+ r3 KVM_SC_MAGIC_R3 Return code
+ r4 Hypercall number
+ r5 First parameter
+ r6 Second parameter
+ r7 Third parameter
+ r8 Fourth parameter
+
+Hypercall definitions are shared in generic code, so the same hypercall numbers
+apply for x86 and powerpc alike.
+
+The magic page
+==============
+
+To enable communication between the hypervisor and guest there is a new shared
+page that contains parts of supervisor visible register state. The guest can
+map this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE.
+
+With this hypercall issued the guest always gets the magic page mapped at the
+desired location in effective and physical address space. For now, we always
+map the page to -4096. This way we can access it using absolute load and store
+functions. The following instruction reads the first field of the magic page:
+
+ ld rX, -4096(0)
+
+The interface is designed to be extensible should there be need later to add
+additional registers to the magic page. If you add fields to the magic page,
+also define a new hypercall feature to indicate that the host can give you more
+registers. Only if the host supports the additional features, make use of them.
+
+The magic page has the following layout as described in
+arch/powerpc/include/asm/kvm_para.h:
+
+struct kvm_vcpu_arch_shared {
+ __u64 scratch1;
+ __u64 scratch2;
+ __u64 scratch3;
+ __u64 critical; /* Guest may not get interrupts if == r1 */
+ __u64 sprg0;
+ __u64 sprg1;
+ __u64 sprg2;
+ __u64 sprg3;
+ __u64 srr0;
+ __u64 srr1;
+ __u64 dar;
+ __u64 msr;
+ __u32 dsisr;
+ __u32 int_pending; /* Tells the guest if we have an interrupt */
+};
+
+Additions to the page must only occur at the end. Struct fields are always 32
+bit aligned.
+
+MSR bits
+========
+
+The MSR contains bits that require hypervisor intervention and bits that do
+not require direct hypervisor intervention because they only get interpreted
+when entering the guest or don't have any impact on the hypervisor's behavior.
+
+The following bits are safe to be set inside the guest:
+
+ MSR_EE
+ MSR_RI
+ MSR_CR
+ MSR_ME
+
+If any other bit changes in the MSR, please still use mtmsr(d).
+
+Patched instructions
+====================
+
+The "ld" and "std" instructions are transormed to "lwz" and "stw" instructions
+respectively on 32 bit systems with an added offset of 4 to accomodate for big
+endianness.
+
+The following is a list of mapping the Linux kernel performs when running as
+guest. Implementing any of those mappings is optional, as the instruction traps
+also act on the shared page. So calling privileged instructions still works as
+before.
+
+From To
+==== ==
+
+mfmsr rX ld rX, magic_page->msr
+mfsprg rX, 0 ld rX, magic_page->sprg0
+mfsprg rX, 1 ld rX, magic_page->sprg1
+mfsprg rX, 2 ld rX, magic_page->sprg2
+mfsprg rX, 3 ld rX, magic_page->sprg3
+mfsrr0 rX ld rX, magic_page->srr0
+mfsrr1 rX ld rX, magic_page->srr1
+mfdar rX ld rX, magic_page->dar
+mfdsisr rX lwz rX, magic_page->dsisr
+
+mtmsr rX std rX, magic_page->msr
+mtsprg 0, rX std rX, magic_page->sprg0
+mtsprg 1, rX std rX, magic_page->sprg1
+mtsprg 2, rX std rX, magic_page->sprg2
+mtsprg 3, rX std rX, magic_page->sprg3
+mtsrr0 rX std rX, magic_page->srr0
+mtsrr1 rX std rX, magic_page->srr1
+mtdar rX std rX, magic_page->dar
+mtdsisr rX stw rX, magic_page->dsisr
+
+tlbsync nop
+
+mtmsrd rX, 0 b <special mtmsr section>
+mtmsr b <special mtmsr section>
+
+mtmsrd rX, 1 b <special mtmsrd section>
+
+[BookE only]
+wrteei [0|1] b <special wrteei section>
+
+
+Some instructions require more logic to determine what's going on than a load
+or store instruction can deliver. To enable patching of those, we keep some
+RAM around where we can live translate instructions to. What happens is the
+following:
+
+ 1) copy emulation code to memory
+ 2) patch that code to fit the emulated instruction
+ 3) patch that code to return to the original pc + 4
+ 4) patch the original instruction to branch to the new code
+
+That way we can inject an arbitrary amount of code as replacement for a single
+instruction. This allows us to check for pending interrupts when setting EE=1
+for example.
+
--
1.6.0.2
next prev parent reply other threads:[~2010-07-01 10:43 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-01 10:42 [PATCH 00/27] KVM PPC PV framework Alexander Graf
2010-07-01 10:42 ` [PATCH 01/27] KVM: PPC: Introduce shared page Alexander Graf
2010-07-01 10:42 ` [PATCH 02/27] KVM: PPC: Convert MSR to " Alexander Graf
2010-07-01 10:42 ` [PATCH 03/27] KVM: PPC: Convert DSISR " Alexander Graf
2010-07-01 10:42 ` [PATCH 04/27] KVM: PPC: Convert DAR " Alexander Graf
2010-07-01 10:42 ` [PATCH 05/27] KVM: PPC: Convert SRR0 and SRR1 " Alexander Graf
2010-07-01 10:42 ` [PATCH 06/27] KVM: PPC: Convert SPRG[0-4] " Alexander Graf
2010-07-01 10:42 ` [PATCH 07/27] KVM: PPC: Implement hypervisor interface Alexander Graf
2010-07-01 10:42 ` [PATCH 08/27] KVM: PPC: Add PV guest critical sections Alexander Graf
2010-07-01 10:42 ` [PATCH 09/27] KVM: PPC: Add PV guest scratch registers Alexander Graf
2010-07-01 10:42 ` [PATCH 10/27] KVM: PPC: Tell guest about pending interrupts Alexander Graf
2010-07-01 10:42 ` [PATCH 11/27] KVM: PPC: Make RMO a define Alexander Graf
2010-07-02 16:23 ` Segher Boessenkool
2010-07-01 10:42 ` [PATCH 12/27] KVM: PPC: First magic page steps Alexander Graf
2010-07-01 10:42 ` [PATCH 13/27] KVM: PPC: Magic Page Book3s support Alexander Graf
2010-07-02 15:37 ` Alexander Graf
2010-07-04 9:42 ` Avi Kivity
2010-07-01 10:42 ` [PATCH 14/27] KVM: PPC: Magic Page BookE support Alexander Graf
2010-07-01 11:18 ` Josh Boyer
2010-07-01 12:25 ` Alexander Graf
2010-07-12 11:24 ` Liu Yu-B13201
2010-07-01 10:42 ` [PATCH 15/27] KVM: PPC: Expose magic page support to guest Alexander Graf
2010-07-01 10:42 ` [PATCH 16/27] KVM: Move kvm_guest_init out of generic code Alexander Graf
2010-07-02 7:41 ` Geert Uytterhoeven
2010-07-02 7:44 ` Alexander Graf
2010-07-01 10:42 ` [PATCH 17/27] KVM: PPC: Generic KVM PV guest support Alexander Graf
2010-07-01 10:42 ` [PATCH 18/27] KVM: PPC: KVM PV guest stubs Alexander Graf
2010-07-01 10:42 ` [PATCH 19/27] KVM: PPC: PV instructions to loads and stores Alexander Graf
2010-07-01 10:42 ` [PATCH 20/27] KVM: PPC: PV tlbsync to nop Alexander Graf
2010-07-01 10:42 ` [PATCH 21/27] KVM: PPC: Introduce kvm_tmp framework Alexander Graf
2010-07-01 10:42 ` [PATCH 22/27] KVM: PPC: Introduce branch patching helper Alexander Graf
2010-07-01 10:42 ` [PATCH 23/27] KVM: PPC: PV assembler helpers Alexander Graf
2010-07-01 10:42 ` [PATCH 24/27] KVM: PPC: PV mtmsrd L=1 Alexander Graf
2010-07-01 10:43 ` [PATCH 25/27] KVM: PPC: PV mtmsrd L=0 and mtmsr Alexander Graf
2010-07-01 10:43 ` [PATCH 26/27] KVM: PPC: PV wrteei Alexander Graf
2010-07-01 10:43 ` Alexander Graf [this message]
2010-07-02 16:27 ` [PATCH 27/27] KVM: PPC: Add Documentation about PV interface Segher Boessenkool
2010-07-02 18:41 ` Alexander Graf
2010-07-03 22:42 ` Benjamin Herrenschmidt
2010-07-04 9:04 ` Alexander Graf
2010-07-03 22:41 ` Benjamin Herrenschmidt
2010-07-04 9:04 ` Alexander Graf
2010-07-04 9:10 ` Avi Kivity
2010-07-04 9:17 ` Alexander Graf
2010-07-04 9:30 ` Alexander Graf
2010-07-04 9:41 ` Avi Kivity
2010-07-04 9:37 ` Avi Kivity
2010-07-02 17:59 ` Hollis Blanchard
2010-07-02 18:47 ` Alexander Graf
2010-07-02 19:10 ` Scott Wood
2010-07-04 9:02 ` Alexander Graf
2010-07-09 9:11 ` MJ embd
2010-07-09 9:15 ` Alexander Graf
2010-07-02 16:22 ` [PATCH 00/27] KVM PPC PV framework Segher Boessenkool
2010-07-02 16:59 ` Alexander Graf
2010-07-09 4:57 ` MJ embd
2010-07-09 6:33 ` Alexander Graf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1277980982-12433-28-git-send-email-agraf@suse.de \
--to=agraf@suse.de \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).