All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4
@ 2009-09-29  8:17 Alexander Graf
  2009-09-30  8:42 ` Avi Kivity
                   ` (10 more replies)
  0 siblings, 11 replies; 244+ messages in thread
From: Alexander Graf @ 2009-09-29  8:17 UTC (permalink / raw)
  To: kvm-ppc

KVM for PowerPC only supports embedded cores at the moment.

While it makes sense to virtualize on small machines, it's even more fun
to do so on big boxes. So I figured we need KVM for PowerPC64 as well.

This patchset implements KVM support for Book3s_64 hosts and guest support
for Book3s_64 and G3/G4.

To really make use of this, you also need a recent version of qemu.


Don't want to apply patches? Get the git tree!

$ git clone git://csgraf.de/kvm
$ git checkout origin/ppc-v4

V1 -> V2:

 - extend sregs with padding
 - new naming scheme (ppc64 -> book3s_64; 74xx -> book3s_32)
 - to_phys -> in-kernel tophys()
 - loadimm -> LOAD_REG_IMMEDIATE
 - call .ko kvm.ko
 - set magic paca bit later
 - run guest code with PACA->soft_enabled=true
 - pt_regs for host state saving (guest too?)
 - only do HV dcbz trick on 970
 - refuse to run on LPAR because of missing SLB pieces

V2 -> V3:

 - fix DAR/DSISR saving
 - allow running on LPAR by modifying the SLB shadow
 - change the SLB implementation to use a mem-backed cache and do
   full world switch on enter/exit. gets rid of "context" magic
 - be more aggressive about DEC injection
 - remove fast ld/st because we're always in host context
 - don't use SPRGs in real->paged transition
 - implement dirty log
 - remove MMIO speedup code
 - SPRG cleanup
   - rename SPRG3 -> SPRN_SPRG_PACA
   - rename SPRG1 -> SPRN_SPRG_SCRATCH0
   - don't use SPRG2

V3 -> V4:

 - use context_id instead of mm_alloc
 - export less

TODO:

 - use MMU Notifiers

Alexander Graf (27):
  Move dirty logging code to sub-arch
  Pass PVR in sregs
  Add Book3s definitions
  Add Book3s fields to vcpu structs
  Add asm/kvm_book3s.h
  Add Book3s_64 intercept helpers
  Add book3s_64 highmem asm code
  Add SLB switching code for entry/exit
  Add interrupt handling code
  Add book3s.c
  Add book3s_64 Host MMU handling
  Add book3s_64 guest MMU
  Add book3s_32 guest MMU
  Add book3s_64 specific opcode emulation
  Add mfdec emulation
  Add desktop PowerPC specific emulation
  Make head_64.S aware of KVM real mode code
  Add Book3s_64 offsets to asm-offsets.c
  Export symbols for KVM module
  Split init_new_context and destroy_context
  Export KVM symbols for module
  Add fields to PACA
  Export new PACA constants in asm-offsets
  Include Book3s_64 target in buildsystem
  Fix trace.h
  Enable 32bit dirty log pointers on 64bit host
  Use Little Endian for Dirty Bitmap

 arch/powerpc/include/asm/exception-64s.h     |    2 +
 arch/powerpc/include/asm/kvm.h               |    2 +
 arch/powerpc/include/asm/kvm_asm.h           |   39 ++
 arch/powerpc/include/asm/kvm_book3s.h        |  136 ++++
 arch/powerpc/include/asm/kvm_book3s_64_asm.h |   58 ++
 arch/powerpc/include/asm/kvm_host.h          |   75 ++-
 arch/powerpc/include/asm/kvm_ppc.h           |    1 +
 arch/powerpc/include/asm/mmu_context.h       |    5 +
 arch/powerpc/include/asm/paca.h              |    9 +
 arch/powerpc/kernel/asm-offsets.c            |   18 +
 arch/powerpc/kernel/exceptions-64s.S         |    8 +
 arch/powerpc/kernel/head_64.S                |    7 +
 arch/powerpc/kernel/ppc_ksyms.c              |    3 +-
 arch/powerpc/kernel/time.c                   |    1 +
 arch/powerpc/kvm/Kconfig                     |   17 +
 arch/powerpc/kvm/Makefile                    |   27 +-
 arch/powerpc/kvm/book3s.c                    |  919 ++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_32_mmu.c             |  354 ++++++++++
 arch/powerpc/kvm/book3s_64_emulate.c         |  338 ++++++++++
 arch/powerpc/kvm/book3s_64_exports.c         |   24 +
 arch/powerpc/kvm/book3s_64_interrupts.S      |  392 +++++++++++
 arch/powerpc/kvm/book3s_64_mmu.c             |  469 +++++++++++++
 arch/powerpc/kvm/book3s_64_mmu_host.c        |  412 ++++++++++++
 arch/powerpc/kvm/book3s_64_rmhandlers.S      |  131 ++++
 arch/powerpc/kvm/book3s_64_slb.S             |  277 ++++++++
 arch/powerpc/kvm/booke.c                     |    5 +
 arch/powerpc/kvm/emulate.c                   |   43 ++-
 arch/powerpc/kvm/powerpc.c                   |    5 -
 arch/powerpc/kvm/trace.h                     |    6 +-
 arch/powerpc/mm/hash_utils_64.c              |    2 +
 arch/powerpc/mm/mmu_context_hash64.c         |   24 +-
 virt/kvm/kvm_main.c                          |   10 +-
 32 files changed, 3799 insertions(+), 20 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s.h
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_64_asm.h
 create mode 100644 arch/powerpc/kvm/book3s.c
 create mode 100644 arch/powerpc/kvm/book3s_32_mmu.c
 create mode 100644 arch/powerpc/kvm/book3s_64_emulate.c
 create mode 100644 arch/powerpc/kvm/book3s_64_exports.c
 create mode 100644 arch/powerpc/kvm/book3s_64_interrupts.S
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu.c
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu_host.c
 create mode 100644 arch/powerpc/kvm/book3s_64_rmhandlers.S
 create mode 100644 arch/powerpc/kvm/book3s_64_slb.S


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4
  2009-09-29  8:17 [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4 Alexander Graf
@ 2009-09-30  8:42 ` Avi Kivity
  2009-09-30  8:47 ` Alexander Graf
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 244+ messages in thread
From: Avi Kivity @ 2009-09-30  8:42 UTC (permalink / raw)
  To: kvm-ppc

On 09/29/2009 10:17 AM, Alexander Graf wrote:
> KVM for PowerPC only supports embedded cores at the moment.
>
> While it makes sense to virtualize on small machines, it's even more fun
> to do so on big boxes. So I figured we need KVM for PowerPC64 as well.
>
> This patchset implements KVM support for Book3s_64 hosts and guest support
> for Book3s_64 and G3/G4.
>
> To really make use of this, you also need a recent version of qemu.
>    

Looks good to my non-ppc eyes.  I'd like to see thus reviewed by the 
powerpc people, then it's good to go.


> TODO:
>
>   - use MMU Notifiers
>    


What's the plan here?  While not a requirement for merging, that's one 
of the kvm points of strength and I'd like to see it supported across 
the board.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4
  2009-09-29  8:17 [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4 Alexander Graf
  2009-09-30  8:42 ` Avi Kivity
@ 2009-09-30  8:47 ` Alexander Graf
  2009-09-30  8:59 ` Avi Kivity
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-09-30  8:47 UTC (permalink / raw)
  To: kvm-ppc


On 30.09.2009, at 10:42, Avi Kivity wrote:

> On 09/29/2009 10:17 AM, Alexander Graf wrote:
>> KVM for PowerPC only supports embedded cores at the moment.
>>
>> While it makes sense to virtualize on small machines, it's even  
>> more fun
>> to do so on big boxes. So I figured we need KVM for PowerPC64 as  
>> well.
>>
>> This patchset implements KVM support for Book3s_64 hosts and guest  
>> support
>> for Book3s_64 and G3/G4.
>>
>> To really make use of this, you also need a recent version of qemu.
>>
>
> Looks good to my non-ppc eyes.  I'd like to see thus reviewed by the  
> powerpc people, then it's good to go.
>
>
>> TODO:
>>
>>  - use MMU Notifiers
>>
>
>
> What's the plan here?  While not a requirement for merging, that's  
> one of the kvm points of strength and I'd like to see it supported  
> across the board.

I'm having a deja vu :-).

The plan is to get qemu ppc64 guest support in a shape where it can  
actually use the KVM support. As it is it's rather useless.
When we have that, a PV interface would be needed to get things fast  
and then the next thing on my list is the MMU notifiers.

Alex


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4
  2009-09-29  8:17 [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4 Alexander Graf
  2009-09-30  8:42 ` Avi Kivity
  2009-09-30  8:47 ` Alexander Graf
@ 2009-09-30  8:59 ` Avi Kivity
  2009-09-30  9:11 ` Alexander Graf
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 244+ messages in thread
From: Avi Kivity @ 2009-09-30  8:59 UTC (permalink / raw)
  To: kvm-ppc

On 09/30/2009 10:47 AM, Alexander Graf wrote:
>>
>> What's the plan here?  While not a requirement for merging, that's 
>> one of the kvm points of strength and I'd like to see it supported 
>> across the board.
>
>
> I'm having a deja vu :-).

Will probably get one on every repost.

>
> The plan is to get qemu ppc64 guest support in a shape where it can 
> actually use the KVM support. As it is it's rather useless.
> When we have that, a PV interface would be needed to get things fast 
> and then the next thing on my list is the MMU notifiers.

Um.  How slow is it today?  What paths are problematic? mmu, context switch?

Our experience with pv on x86 has been mostly negative.  It's not 
trivial to get security right, it ended up slower than non-pv, and 
hardware obsoleted it fairly quickly.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4
  2009-09-29  8:17 [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4 Alexander Graf
                   ` (2 preceding siblings ...)
  2009-09-30  8:59 ` Avi Kivity
@ 2009-09-30  9:11 ` Alexander Graf
  2009-09-30  9:24 ` Avi Kivity
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-09-30  9:11 UTC (permalink / raw)
  To: kvm-ppc


On 30.09.2009, at 10:59, Avi Kivity wrote:

> On 09/30/2009 10:47 AM, Alexander Graf wrote:
>>>
>>> What's the plan here?  While not a requirement for merging, that's  
>>> one of the kvm points of strength and I'd like to see it supported  
>>> across the board.
>>
>>
>> I'm having a deja vu :-).
>
> Will probably get one on every repost.

Yippie :)

>> The plan is to get qemu ppc64 guest support in a shape where it can  
>> actually use the KVM support. As it is it's rather useless.
>> When we have that, a PV interface would be needed to get things  
>> fast and then the next thing on my list is the MMU notifiers.
>
> Um.  How slow is it today?  What paths are problematic? mmu, context  
> switch?

Instruction emulation.

X86 with virtualization extensions doesn't trap often, as most of the  
state can be safely handled within the guest mode.
Now with PPC we're basically running in "ring 3" (called "problem  
state" in ppc speech) which traps all the time because guests change  
the IF or access some SPRs that we don't really need to trap on, but  
only need to sync state with on #VMEXIT.

So the PV idea here is to have a shared page between host and guest  
that contains guest specific SPRs and other state (an MSR shadow for  
example). That way the guest can patch itself to use that shared page  
and KVM always knows about the most current state on #VMEXIT. At the  
same time we're reducing exits by a _lot_.

A short kvm_stat during boot of a ppc32 guest on ppc64 shows what I'm  
talking about:

  dec                       3224     168
  exits                 18957500 1037240
  ext_intr                    75       5
  halt_wakeup               6874       0
  inst_emu               8570503  818597
  ld                           0       0
  ld_slow                      0       0
  mmio                   8719444   26249
  pf_instruc              302572   35379
  pf_storage             9215970   86750
  queue_intr              354020   31482
  sig                       7244     188
  sp_instruc              302541   35365
  sp_storage              370002   45370
  st                           0       0
  st_slow                      0       0
  sysc                     57907    5342


As you can see the bulk of exits are from MMIO and emulation.

We certainly won't be able to get rid of all the emulation exits, but  
quite a bunch of them aren't really that useful.

For MMIO we'll hopefully be able to use virtio.

> Our experience with pv on x86 has been mostly negative.  It's not  
> trivial to get security right, it ended up slower than non-pv, and  
> hardware obsoleted it fairly quickly.

Yes, and I really don't want to overdo it. PV for mfmsr/mtmsr and  
mfspr/mtspr is really necessary. X86 simply has that in hardware.

Alex

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4
  2009-09-29  8:17 [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4 Alexander Graf
                   ` (3 preceding siblings ...)
  2009-09-30  9:11 ` Alexander Graf
@ 2009-09-30  9:24 ` Avi Kivity
  2009-09-30  9:37 ` Alexander Graf
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 244+ messages in thread
From: Avi Kivity @ 2009-09-30  9:24 UTC (permalink / raw)
  To: kvm-ppc

On 09/30/2009 11:11 AM, Alexander Graf wrote:
>>> The plan is to get qemu ppc64 guest support in a shape where it can 
>>> actually use the KVM support. As it is it's rather useless.
>>> When we have that, a PV interface would be needed to get things fast 
>>> and then the next thing on my list is the MMU notifiers.
>>
>> Um.  How slow is it today?  What paths are problematic? mmu, context 
>> switch?
>
> Instruction emulation.
>
> X86 with virtualization extensions doesn't trap often, as most of the 
> state can be safely handled within the guest mode.
> Now with PPC we're basically running in "ring 3" (called "problem 
> state" in ppc speech) which traps all the time because guests change 
> the IF or access some SPRs that we don't really need to trap on, but 
> only need to sync state with on #VMEXIT.
>
> So the PV idea here is to have a shared page between host and guest 
> that contains guest specific SPRs and other state (an MSR shadow for 
> example). That way the guest can patch itself to use that shared page 
> and KVM always knows about the most current state on #VMEXIT. At the 
> same time we're reducing exits by a _lot_.

But writing those registers often has side effects.  For example, 
enabling interrupts should also inject an interrupt when one is 
pending.  On x86 we have the same problem with the TPR on Windows XP, so 
we copy it to the guest on entry (along with the pending interrupt 
state) and back to the host on exit.  The guest uses an atomic operation 
to change the TPR and read pending interrupt information, and if an 
interrupt becomes unmasked, it calls a hypercall to trigger it.

Presumably you'll be doing something similar?

In any case, I recommend keeping fine-grained control over those bits so 
they can be enabled/disabled/expanded as needed.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4
  2009-09-29  8:17 [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4 Alexander Graf
                   ` (4 preceding siblings ...)
  2009-09-30  9:24 ` Avi Kivity
@ 2009-09-30  9:37 ` Alexander Graf
  2009-10-02  0:26 ` Benjamin Herrenschmidt
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-09-30  9:37 UTC (permalink / raw)
  To: kvm-ppc


On 30.09.2009, at 11:24, Avi Kivity wrote:

> On 09/30/2009 11:11 AM, Alexander Graf wrote:
>>>> The plan is to get qemu ppc64 guest support in a shape where it  
>>>> can actually use the KVM support. As it is it's rather useless.
>>>> When we have that, a PV interface would be needed to get things  
>>>> fast and then the next thing on my list is the MMU notifiers.
>>>
>>> Um.  How slow is it today?  What paths are problematic? mmu,  
>>> context switch?
>>
>> Instruction emulation.
>>
>> X86 with virtualization extensions doesn't trap often, as most of  
>> the state can be safely handled within the guest mode.
>> Now with PPC we're basically running in "ring 3" (called "problem  
>> state" in ppc speech) which traps all the time because guests  
>> change the IF or access some SPRs that we don't really need to trap  
>> on, but only need to sync state with on #VMEXIT.
>>
>> So the PV idea here is to have a shared page between host and guest  
>> that contains guest specific SPRs and other state (an MSR shadow  
>> for example). That way the guest can patch itself to use that  
>> shared page and KVM always knows about the most current state on  
>> #VMEXIT. At the same time we're reducing exits by a _lot_.
>
> But writing those registers often has side effects.  For example,  
> enabling interrupts should also inject an interrupt when one is  
> pending.  On x86 we have the same problem with the TPR on Windows  
> XP, so we copy it to the guest on entry (along with the pending  
> interrupt state) and back to the host on exit.  The guest uses an  
> atomic operation to change the TPR and read pending interrupt  
> information, and if an interrupt becomes unmasked, it calls a  
> hypercall to trigger it.
>
> Presumably you'll be doing something similar?

Yes, very similar to the TPR hack. Just on a broader scale :-).

As far as MSR_IF = 1 goes, you're right. It does make sense to trap it  
in certain circumstances. Setting it to 0 doesn't need a trap though.

Also since we're PV'ed we could write an atomic bit in the shared page  
that says "hey guest - I have an interrupt waiting for you". When that  
is set MSR_IF = 1 returns to the host. Something like that.

> In any case, I recommend keeping fine-grained control over those  
> bits so they can be enabled/disabled/expanded as needed.

Yeah, sounds like a good idea. We'll see how this goes. As long as  
there are only very few users we can change the interface as much as  
we like since we don't break anyone :-). And the PPC user base is  
smaller than the x86 one in general.

Alex

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4
  2009-09-29  8:17 [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4 Alexander Graf
                   ` (5 preceding siblings ...)
  2009-09-30  9:37 ` Alexander Graf
@ 2009-10-02  0:26 ` Benjamin Herrenschmidt
  2009-10-02  0:32 ` Benjamin Herrenschmidt
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-02  0:26 UTC (permalink / raw)
  To: kvm-ppc


> Yes, and I really don't want to overdo it. PV for mfmsr/mtmsr and  
> mfspr/mtspr is really necessary. X86 simply has that in hardware.

Note to Avi: This is also because we aren't actually using the
virtualization feature of the processor, but instead running
the guest basically in user space.

The reason for that is that today, you pretty much cannot access
the "hypervisor" mode of the CPU on any ppc64 machine, it's either
disabled by the service processor (Apple G5s, AFAIK PowerStation too) or
you are already running under some kind of hypervisor (All IBM machines,
PS3)...

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4
  2009-09-29  8:17 [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4 Alexander Graf
                   ` (6 preceding siblings ...)
  2009-10-02  0:26 ` Benjamin Herrenschmidt
@ 2009-10-02  0:32 ` Benjamin Herrenschmidt
  2009-10-03 10:08 ` Avi Kivity
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-02  0:32 UTC (permalink / raw)
  To: kvm-ppc

On Wed, 2009-09-30 at 11:24 +0200, Avi Kivity wrote:

> But writing those registers often has side effects.  For example, 
> enabling interrupts should also inject an interrupt when one is 
> pending.  On x86 we have the same problem with the TPR on Windows XP, so 
> we copy it to the guest on entry (along with the pending interrupt 
> state) and back to the host on exit.  The guest uses an atomic operation 
> to change the TPR and read pending interrupt information, and if an 
> interrupt becomes unmasked, it calls a hypercall to trigger it.
> 
> Presumably you'll be doing something similar?
> 
> In any case, I recommend keeping fine-grained control over those bits so 
> they can be enabled/disabled/expanded as needed.

Sure, for those who have side-effects, special care must be taken,
either with PV tricks as Alex suggested in another reply, or by
emulation.

But for example, pretty much every time the MSR is written, it's also
previously -read-. If we keep a shadow of the guest MSR in the "magic
page", then we can already half the number of emulation traps simply
by having the guest read from there instead, and still trap on writes.

Those reads don't have side effects.

There's also a bunch of SPRs that don't have a direct side effect such
as the SRRs and SPRGs which are heavily used for exception entry and
exit. The former are storage for the PC and MSR values for a subsequent
rfi instruction (return from interrupt) and the later are just general
purpose storage for the kernel to save a few GPRs into in the exception
handling code.

By replacing these by, for example, absolute load/stores in a magic
page mapped differently per-CPU (one trick we have in mind) we can
very significantly speed up the guest kernel exception entry and exit,
and this is without dealing with side effects.

As Alex mention, we can from there try to go further for things like
MSR changes, but you are right that this needs to be done more
carefully.

Cheers,
Ben.




^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4
  2009-09-29  8:17 [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4 Alexander Graf
                   ` (7 preceding siblings ...)
  2009-10-02  0:32 ` Benjamin Herrenschmidt
@ 2009-10-03 10:08 ` Avi Kivity
  2009-10-03 10:58 ` Benjamin Herrenschmidt
  2009-10-03 11:10 ` Benjamin Herrenschmidt
  10 siblings, 0 replies; 244+ messages in thread
From: Avi Kivity @ 2009-10-03 10:08 UTC (permalink / raw)
  To: kvm-ppc

On 10/02/2009 02:32 AM, Benjamin Herrenschmidt wrote:
> On Wed, 2009-09-30 at 11:24 +0200, Avi Kivity wrote:
>
>    
>> But writing those registers often has side effects.  For example,
>> enabling interrupts should also inject an interrupt when one is
>> pending.  On x86 we have the same problem with the TPR on Windows XP, so
>> we copy it to the guest on entry (along with the pending interrupt
>> state) and back to the host on exit.  The guest uses an atomic operation
>> to change the TPR and read pending interrupt information, and if an
>> interrupt becomes unmasked, it calls a hypercall to trigger it.
>>
>> Presumably you'll be doing something similar?
>>
>> In any case, I recommend keeping fine-grained control over those bits so
>> they can be enabled/disabled/expanded as needed.
>>      
> Sure, for those who have side-effects, special care must be taken,
> either with PV tricks as Alex suggested in another reply, or by
> emulation.
>
> But for example, pretty much every time the MSR is written, it's also
> previously -read-. If we keep a shadow of the guest MSR in the "magic
> page", then we can already half the number of emulation traps simply
> by having the guest read from there instead, and still trap on writes.
>
> Those reads don't have side effects.
>    

So these MSRs can be modified by the hypervisor?  Otherwise you'd cache 
them in the guest with no hypervisor involvement, right?  (just making 
sure :)

> There's also a bunch of SPRs that don't have a direct side effect such
> as the SRRs and SPRGs which are heavily used for exception entry and
> exit. The former are storage for the PC and MSR values for a subsequent
> rfi instruction (return from interrupt) and the later are just general
> purpose storage for the kernel to save a few GPRs into in the exception
> handling code.
>
> By replacing these by, for example, absolute load/stores in a magic
> page mapped differently per-CPU (one trick we have in mind) we can
> very significantly speed up the guest kernel exception entry and exit,
> and this is without dealing with side effects.
>
> As Alex mention, we can from there try to go further for things like
> MSR changes, but you are right that this needs to be done more
> carefully.
>    

Thanks for the explanations.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4
  2009-09-29  8:17 [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4 Alexander Graf
                   ` (8 preceding siblings ...)
  2009-10-03 10:08 ` Avi Kivity
@ 2009-10-03 10:58 ` Benjamin Herrenschmidt
  2009-10-03 11:10 ` Benjamin Herrenschmidt
  10 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-03 10:58 UTC (permalink / raw)
  To: kvm-ppc

On Sat, 2009-10-03 at 12:08 +0200, Avi Kivity wrote:
> 
> So these MSRs can be modified by the hypervisor?  Otherwise you'd cache 
> them in the guest with no hypervisor involvement, right?  (just making 
> sure :)

There's one MSR :-) Among others, it can be altered by the act of
taking an interrupt (for example, it contains the PR bit, which means
user vs. supervisor, things like that).

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4
  2009-09-29  8:17 [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4 Alexander Graf
                   ` (9 preceding siblings ...)
  2009-10-03 10:58 ` Benjamin Herrenschmidt
@ 2009-10-03 11:10 ` Benjamin Herrenschmidt
  10 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-03 11:10 UTC (permalink / raw)
  To: kvm-ppc

On Sat, 2009-10-03 at 20:59 +1000, Benjamin Herrenschmidt wrote:
> On Sat, 2009-10-03 at 12:08 +0200, Avi Kivity wrote:
> > 
> > So these MSRs can be modified by the hypervisor?  Otherwise you'd cache 
> > them in the guest with no hypervisor involvement, right?  (just making 
> > sure :)
> 
> There's one MSR :-) Among others, it can be altered by the act of
> taking an interrupt (for example, it contains the PR bit, which means
> user vs. supervisor, things like that).

For a bit more context...

On PowerPC, all those "special" registers are called "SPR"s (special
registers, surprise ! :-)

They are generally accessed via mfspr/mtspr instructions that encode
the SPR number, though some of them can also have decicated instructions
or be set as a side effect of some instructions or events etc...

MSR is a bit special here because it's not per-se an SPR. It's the
Machine State Register, in the core, it's in the fast path of a whole
bunch of pipeline stages, and it contains the state of things such as
the current privilege level, the state of MMU translation for I and D,
the interrupt enable bit, etc... It's accessed via specific mfmsr/mtmsr
instructions (to simplify as there are other instructions that modify
the MSR as a side effect, interrupts do that too, etc...).

So the MSR warrants special treatment for KVM. Other SPRs may or may not
depending on what they are. Some are just storage like the SPRGs, some
contain a copy of the previous PC and MSR when taking an interrupt (SRR0
and SRR1) and are used by the rfi instruction to restore them when
returning from an interrupt, and some are totally unrelated (such as
the decrementer which is our core timer facility) or other processor
specific registers containing various things like cache configuration
etc...

The main issue with kernel entry / exit performances, though, revolve
around MSR, SPRG and SRR0/1 accesses. SPRGs could -almost- be entirely
guest cached, but since the goal is to save a register to use as scrach
at a time when no register can be clobbered, saving a register to them
must fit in one instruction that has no side effect. The typical option
we are thinking about here is a store-absolute to an address that KVM
can then map to some per-CPU storage page.

Things like SRR0/SRR1 can be replaced by similar load/stores as long as
the HV sets them appropriately with the original MSR (or emulated MSR)
and PC when directing an interrupt to the guest, and know where to
retrieve the content set by the kernel when emulating an "rfi"
instruction. The MSR can be read from cache always by the guest as
long as the HV knows how to alter its cached value when directing
an interrupt to the guest or emulating another of those instructions
that can affect it (such as rfi of course), etc...

So in our case, that (relatively small) level of paravirt provides a
tremendous performance boost, since every guest interrupt (syscall,
etc...) goes down from something like a good dozen emulation traps
to maybe a couple just for the base entry/exit path from the kernel.

This is very different from the issues around PV that you guys had in
x86 world related to MMU emulation, though in our case, PV may also
prove useful, as our MMU structure is very different, this is a
completely orthogonal matter.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 244+ messages in thread

* [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
  2009-09-29  8:17 [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4 Alexander Graf
@ 2009-10-21 15:03 ` Alexander Graf
  2009-09-30  8:47 ` Alexander Graf
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

KVM for PowerPC only supports embedded cores at the moment.

While it makes sense to virtualize on small machines, it's even more fun
to do so on big boxes. So I figured we need KVM for PowerPC64 as well.

This patchset implements KVM support for Book3s_64 hosts and guest support
for Book3s_64 and G3/G4.

To really make use of this, you also need a recent version of qemu.


Don't want to apply patches? Get the git tree!

$ git clone git://csgraf.de/kvm
$ git checkout origin/ppc-v4

V1 -> V2:

 - extend sregs with padding
 - new naming scheme (ppc64 -> book3s_64; 74xx -> book3s_32)
 - to_phys -> in-kernel tophys()
 - loadimm -> LOAD_REG_IMMEDIATE
 - call .ko kvm.ko
 - set magic paca bit later
 - run guest code with PACA->soft_enabled=true
 - pt_regs for host state saving (guest too?)
 - only do HV dcbz trick on 970
 - refuse to run on LPAR because of missing SLB pieces

V2 -> V3:

 - fix DAR/DSISR saving
 - allow running on LPAR by modifying the SLB shadow
 - change the SLB implementation to use a mem-backed cache and do
   full world switch on enter/exit. gets rid of "context" magic
 - be more aggressive about DEC injection
 - remove fast ld/st because we're always in host context
 - don't use SPRGs in real->paged transition
 - implement dirty log
 - remove MMIO speedup code
 - SPRG cleanup
   - rename SPRG3 -> SPRN_SPRG_PACA
   - rename SPRG1 -> SPRN_SPRG_SCRATCH0
   - don't use SPRG2

V3 -> V4:

 - use context_id instead of mm_alloc
 - export less

V4 -> V5:

 - use get_tb instead of mftb
 - make ppc32 and ppc64 emulation share more code
 - make pvr 32 bits
 - add patch to use hrtimer for decrememter

Alexander Graf (27):
  Move dirty logging code to sub-arch
  Pass PVR in sregs
  Add Book3s definitions
  Add Book3s fields to vcpu structs
  Add asm/kvm_book3s.h
  Add Book3s_64 intercept helpers
  Add book3s_64 highmem asm code
  Add SLB switching code for entry/exit
  Add interrupt handling code
  Add book3s.c
  Add book3s_64 Host MMU handling
  Add book3s_64 guest MMU
  Add book3s_32 guest MMU
  Add book3s_64 specific opcode emulation
  Add mfdec emulation
  Add desktop PowerPC specific emulation
  Make head_64.S aware of KVM real mode code
  Add Book3s_64 offsets to asm-offsets.c
  Export symbols for KVM module
  Split init_new_context and destroy_context
  Export KVM symbols for module
  Add fields to PACA
  Export new PACA constants in asm-offsets
  Include Book3s_64 target in buildsystem
  Fix trace.h
  Use Little Endian for Dirty Bitmap
  Use hrtimers for the decrementer

 arch/powerpc/include/asm/exception-64s.h     |    2 +
 arch/powerpc/include/asm/kvm.h               |    2 +
 arch/powerpc/include/asm/kvm_asm.h           |   39 ++
 arch/powerpc/include/asm/kvm_book3s.h        |  136 ++++
 arch/powerpc/include/asm/kvm_book3s_64_asm.h |   58 ++
 arch/powerpc/include/asm/kvm_host.h          |   79 +++-
 arch/powerpc/include/asm/kvm_ppc.h           |    1 +
 arch/powerpc/include/asm/mmu_context.h       |    5 +
 arch/powerpc/include/asm/paca.h              |    9 +
 arch/powerpc/kernel/asm-offsets.c            |   18 +
 arch/powerpc/kernel/exceptions-64s.S         |    8 +
 arch/powerpc/kernel/head_64.S                |    7 +
 arch/powerpc/kernel/ppc_ksyms.c              |    3 +-
 arch/powerpc/kernel/time.c                   |    1 +
 arch/powerpc/kvm/Kconfig                     |   17 +
 arch/powerpc/kvm/Makefile                    |   27 +-
 arch/powerpc/kvm/book3s.c                    |  919 ++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_32_mmu.c             |  354 ++++++++++
 arch/powerpc/kvm/book3s_64_emulate.c         |  338 ++++++++++
 arch/powerpc/kvm/book3s_64_exports.c         |   24 +
 arch/powerpc/kvm/book3s_64_interrupts.S      |  392 +++++++++++
 arch/powerpc/kvm/book3s_64_mmu.c             |  469 +++++++++++++
 arch/powerpc/kvm/book3s_64_mmu_host.c        |  412 ++++++++++++
 arch/powerpc/kvm/book3s_64_rmhandlers.S      |  131 ++++
 arch/powerpc/kvm/book3s_64_slb.S             |  277 ++++++++
 arch/powerpc/kvm/booke.c                     |    5 +
 arch/powerpc/kvm/emulate.c                   |   66 ++-
 arch/powerpc/kvm/powerpc.c                   |   25 +-
 arch/powerpc/kvm/trace.h                     |    6 +-
 arch/powerpc/mm/hash_utils_64.c              |    2 +
 arch/powerpc/mm/mmu_context_hash64.c         |   24 +-
 virt/kvm/kvm_main.c                          |    5 +-
 32 files changed, 3827 insertions(+), 34 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s.h
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_64_asm.h
 create mode 100644 arch/powerpc/kvm/book3s.c
 create mode 100644 arch/powerpc/kvm/book3s_32_mmu.c
 create mode 100644 arch/powerpc/kvm/book3s_64_emulate.c
 create mode 100644 arch/powerpc/kvm/book3s_64_exports.c
 create mode 100644 arch/powerpc/kvm/book3s_64_interrupts.S
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu.c
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu_host.c
 create mode 100644 arch/powerpc/kvm/book3s_64_rmhandlers.S
 create mode 100644 arch/powerpc/kvm/book3s_64_slb.S


^ permalink raw reply	[flat|nested] 244+ messages in thread

* [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
@ 2009-10-21 15:03 ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

KVM for PowerPC only supports embedded cores at the moment.

While it makes sense to virtualize on small machines, it's even more fun
to do so on big boxes. So I figured we need KVM for PowerPC64 as well.

This patchset implements KVM support for Book3s_64 hosts and guest support
for Book3s_64 and G3/G4.

To really make use of this, you also need a recent version of qemu.


Don't want to apply patches? Get the git tree!

$ git clone git://csgraf.de/kvm
$ git checkout origin/ppc-v4

V1 -> V2:

 - extend sregs with padding
 - new naming scheme (ppc64 -> book3s_64; 74xx -> book3s_32)
 - to_phys -> in-kernel tophys()
 - loadimm -> LOAD_REG_IMMEDIATE
 - call .ko kvm.ko
 - set magic paca bit later
 - run guest code with PACA->soft_enabled=true
 - pt_regs for host state saving (guest too?)
 - only do HV dcbz trick on 970
 - refuse to run on LPAR because of missing SLB pieces

V2 -> V3:

 - fix DAR/DSISR saving
 - allow running on LPAR by modifying the SLB shadow
 - change the SLB implementation to use a mem-backed cache and do
   full world switch on enter/exit. gets rid of "context" magic
 - be more aggressive about DEC injection
 - remove fast ld/st because we're always in host context
 - don't use SPRGs in real->paged transition
 - implement dirty log
 - remove MMIO speedup code
 - SPRG cleanup
   - rename SPRG3 -> SPRN_SPRG_PACA
   - rename SPRG1 -> SPRN_SPRG_SCRATCH0
   - don't use SPRG2

V3 -> V4:

 - use context_id instead of mm_alloc
 - export less

V4 -> V5:

 - use get_tb instead of mftb
 - make ppc32 and ppc64 emulation share more code
 - make pvr 32 bits
 - add patch to use hrtimer for decrememter

Alexander Graf (27):
  Move dirty logging code to sub-arch
  Pass PVR in sregs
  Add Book3s definitions
  Add Book3s fields to vcpu structs
  Add asm/kvm_book3s.h
  Add Book3s_64 intercept helpers
  Add book3s_64 highmem asm code
  Add SLB switching code for entry/exit
  Add interrupt handling code
  Add book3s.c
  Add book3s_64 Host MMU handling
  Add book3s_64 guest MMU
  Add book3s_32 guest MMU
  Add book3s_64 specific opcode emulation
  Add mfdec emulation
  Add desktop PowerPC specific emulation
  Make head_64.S aware of KVM real mode code
  Add Book3s_64 offsets to asm-offsets.c
  Export symbols for KVM module
  Split init_new_context and destroy_context
  Export KVM symbols for module
  Add fields to PACA
  Export new PACA constants in asm-offsets
  Include Book3s_64 target in buildsystem
  Fix trace.h
  Use Little Endian for Dirty Bitmap
  Use hrtimers for the decrementer

 arch/powerpc/include/asm/exception-64s.h     |    2 +
 arch/powerpc/include/asm/kvm.h               |    2 +
 arch/powerpc/include/asm/kvm_asm.h           |   39 ++
 arch/powerpc/include/asm/kvm_book3s.h        |  136 ++++
 arch/powerpc/include/asm/kvm_book3s_64_asm.h |   58 ++
 arch/powerpc/include/asm/kvm_host.h          |   79 +++-
 arch/powerpc/include/asm/kvm_ppc.h           |    1 +
 arch/powerpc/include/asm/mmu_context.h       |    5 +
 arch/powerpc/include/asm/paca.h              |    9 +
 arch/powerpc/kernel/asm-offsets.c            |   18 +
 arch/powerpc/kernel/exceptions-64s.S         |    8 +
 arch/powerpc/kernel/head_64.S                |    7 +
 arch/powerpc/kernel/ppc_ksyms.c              |    3 +-
 arch/powerpc/kernel/time.c                   |    1 +
 arch/powerpc/kvm/Kconfig                     |   17 +
 arch/powerpc/kvm/Makefile                    |   27 +-
 arch/powerpc/kvm/book3s.c                    |  919 ++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_32_mmu.c             |  354 ++++++++++
 arch/powerpc/kvm/book3s_64_emulate.c         |  338 ++++++++++
 arch/powerpc/kvm/book3s_64_exports.c         |   24 +
 arch/powerpc/kvm/book3s_64_interrupts.S      |  392 +++++++++++
 arch/powerpc/kvm/book3s_64_mmu.c             |  469 +++++++++++++
 arch/powerpc/kvm/book3s_64_mmu_host.c        |  412 ++++++++++++
 arch/powerpc/kvm/book3s_64_rmhandlers.S      |  131 ++++
 arch/powerpc/kvm/book3s_64_slb.S             |  277 ++++++++
 arch/powerpc/kvm/booke.c                     |    5 +
 arch/powerpc/kvm/emulate.c                   |   66 ++-
 arch/powerpc/kvm/powerpc.c                   |   25 +-
 arch/powerpc/kvm/trace.h                     |    6 +-
 arch/powerpc/mm/hash_utils_64.c              |    2 +
 arch/powerpc/mm/mmu_context_hash64.c         |   24 +-
 virt/kvm/kvm_main.c                          |    5 +-
 32 files changed, 3827 insertions(+), 34 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s.h
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_64_asm.h
 create mode 100644 arch/powerpc/kvm/book3s.c
 create mode 100644 arch/powerpc/kvm/book3s_32_mmu.c
 create mode 100644 arch/powerpc/kvm/book3s_64_emulate.c
 create mode 100644 arch/powerpc/kvm/book3s_64_exports.c
 create mode 100644 arch/powerpc/kvm/book3s_64_interrupts.S
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu.c
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu_host.c
 create mode 100644 arch/powerpc/kvm/book3s_64_rmhandlers.S
 create mode 100644 arch/powerpc/kvm/book3s_64_slb.S


^ permalink raw reply	[flat|nested] 244+ messages in thread

* [PATCH 01/27] Move dirty logging code to sub-arch
  2009-10-21 15:03 ` Alexander Graf
@ 2009-10-21 15:03   ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

PowerPC code handles dirty logging in the generic parts atm. While this
is great for "return -ENOTSUPP", we need to be rather target specific
when actually implementing it.

So let's split it to implementation specific code, so we can implement
it for book3s.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/booke.c   |    5 +++++
 arch/powerpc/kvm/powerpc.c |    5 -----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index e7bf4d0..06f5a9e 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -520,6 +520,11 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 	return kvmppc_core_vcpu_translate(vcpu, tr);
 }
 
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+	return -ENOTSUPP;
+}
+
 int __init kvmppc_booke_init(void)
 {
 	unsigned long ivor[16];
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 5902bbc..4ae3490 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -410,11 +410,6 @@ out:
 	return r;
 }
 
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
-{
-	return -ENOTSUPP;
-}
-
 long kvm_arch_vm_ioctl(struct file *filp,
                        unsigned int ioctl, unsigned long arg)
 {
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 01/27] Move dirty logging code to sub-arch
@ 2009-10-21 15:03   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

PowerPC code handles dirty logging in the generic parts atm. While this
is great for "return -ENOTSUPP", we need to be rather target specific
when actually implementing it.

So let's split it to implementation specific code, so we can implement
it for book3s.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/booke.c   |    5 +++++
 arch/powerpc/kvm/powerpc.c |    5 -----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index e7bf4d0..06f5a9e 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -520,6 +520,11 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 	return kvmppc_core_vcpu_translate(vcpu, tr);
 }
 
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+	return -ENOTSUPP;
+}
+
 int __init kvmppc_booke_init(void)
 {
 	unsigned long ivor[16];
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 5902bbc..4ae3490 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -410,11 +410,6 @@ out:
 	return r;
 }
 
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
-{
-	return -ENOTSUPP;
-}
-
 long kvm_arch_vm_ioctl(struct file *filp,
                        unsigned int ioctl, unsigned long arg)
 {
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 02/27] Pass PVR in sregs
       [not found]   ` <1256137413-15256-2-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-21 15:03       ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

Right now sregs is unused on PPC, so we can use it for initialization
of the CPU.

KVM on BookE always virtualizes the host CPU. On Book3s we go a step further
and take the PVR from userspace that tells us what kind of CPU we are supposed
to virtualize, because we support Book3s_32 and Book3s_64 guests.

In order to get that information, we use the sregs ioctl, because we don't
want to reset the guest CPU on every normal register set.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>

---

v4 -> v5

  - make PVR 32 bits
---
 arch/powerpc/include/asm/kvm.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h
index bb2de6a..c9ca97f 100644
--- a/arch/powerpc/include/asm/kvm.h
+++ b/arch/powerpc/include/asm/kvm.h
@@ -46,6 +46,8 @@ struct kvm_regs {
 };
 
 struct kvm_sregs {
+	__u32 pvr;
+	char pad[1020];
 };
 
 struct kvm_fpu {
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 02/27] Pass PVR in sregs
@ 2009-10-21 15:03       ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

Right now sregs is unused on PPC, so we can use it for initialization
of the CPU.

KVM on BookE always virtualizes the host CPU. On Book3s we go a step further
and take the PVR from userspace that tells us what kind of CPU we are supposed
to virtualize, because we support Book3s_32 and Book3s_64 guests.

In order to get that information, we use the sregs ioctl, because we don't
want to reset the guest CPU on every normal register set.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v4 -> v5

  - make PVR 32 bits
---
 arch/powerpc/include/asm/kvm.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h
index bb2de6a..c9ca97f 100644
--- a/arch/powerpc/include/asm/kvm.h
+++ b/arch/powerpc/include/asm/kvm.h
@@ -46,6 +46,8 @@ struct kvm_regs {
 };
 
 struct kvm_sregs {
+	__u32 pvr;
+	char pad[1020];
 };
 
 struct kvm_fpu {
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 03/27] Add Book3s definitions
  2009-10-21 15:03       ` Alexander Graf
@ 2009-10-21 15:03         ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

We need quite a bunch of new constants for KVM on Book3s,
so let's define them now.

These constants will be used in later patches.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4

  - remove old kernel compat code
---
 arch/powerpc/include/asm/kvm_asm.h |   39 ++++++++++++++++++++++++++++++++++++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h
index 56bfae5..19ddb35 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -49,6 +49,45 @@
 #define BOOKE_INTERRUPT_SPE_FP_ROUND 34
 #define BOOKE_INTERRUPT_PERFORMANCE_MONITOR 35
 
+/* book3s */
+
+#define BOOK3S_INTERRUPT_SYSTEM_RESET	0x100
+#define BOOK3S_INTERRUPT_MACHINE_CHECK	0x200
+#define BOOK3S_INTERRUPT_DATA_STORAGE	0x300
+#define BOOK3S_INTERRUPT_DATA_SEGMENT	0x380
+#define BOOK3S_INTERRUPT_INST_STORAGE	0x400
+#define BOOK3S_INTERRUPT_INST_SEGMENT	0x480
+#define BOOK3S_INTERRUPT_EXTERNAL	0x500
+#define BOOK3S_INTERRUPT_ALIGNMENT	0x600
+#define BOOK3S_INTERRUPT_PROGRAM	0x700
+#define BOOK3S_INTERRUPT_FP_UNAVAIL	0x800
+#define BOOK3S_INTERRUPT_DECREMENTER	0x900
+#define BOOK3S_INTERRUPT_SYSCALL	0xc00
+#define BOOK3S_INTERRUPT_TRACE		0xd00
+#define BOOK3S_INTERRUPT_PERFMON	0xf00
+#define BOOK3S_INTERRUPT_ALTIVEC	0xf20
+#define BOOK3S_INTERRUPT_VSX		0xf40
+
+#define BOOK3S_IRQPRIO_SYSTEM_RESET		0
+#define BOOK3S_IRQPRIO_DATA_SEGMENT		1
+#define BOOK3S_IRQPRIO_INST_SEGMENT		2
+#define BOOK3S_IRQPRIO_DATA_STORAGE		3
+#define BOOK3S_IRQPRIO_INST_STORAGE		4
+#define BOOK3S_IRQPRIO_ALIGNMENT		5
+#define BOOK3S_IRQPRIO_PROGRAM			6
+#define BOOK3S_IRQPRIO_FP_UNAVAIL		7
+#define BOOK3S_IRQPRIO_ALTIVEC			8
+#define BOOK3S_IRQPRIO_VSX			9
+#define BOOK3S_IRQPRIO_SYSCALL			10
+#define BOOK3S_IRQPRIO_MACHINE_CHECK		11
+#define BOOK3S_IRQPRIO_DEBUG			12
+#define BOOK3S_IRQPRIO_EXTERNAL			13
+#define BOOK3S_IRQPRIO_DECREMENTER		14
+#define BOOK3S_IRQPRIO_PERFORMANCE_MONITOR	15
+#define BOOK3S_IRQPRIO_MAX			16
+
+#define BOOK3S_HFLAG_DCBZ32			0x1
+
 #define RESUME_FLAG_NV          (1<<0)  /* Reload guest nonvolatile state? */
 #define RESUME_FLAG_HOST        (1<<1)  /* Resume host? */
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 03/27] Add Book3s definitions
@ 2009-10-21 15:03         ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

We need quite a bunch of new constants for KVM on Book3s,
so let's define them now.

These constants will be used in later patches.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4

  - remove old kernel compat code
---
 arch/powerpc/include/asm/kvm_asm.h |   39 ++++++++++++++++++++++++++++++++++++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h
index 56bfae5..19ddb35 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -49,6 +49,45 @@
 #define BOOKE_INTERRUPT_SPE_FP_ROUND 34
 #define BOOKE_INTERRUPT_PERFORMANCE_MONITOR 35
 
+/* book3s */
+
+#define BOOK3S_INTERRUPT_SYSTEM_RESET	0x100
+#define BOOK3S_INTERRUPT_MACHINE_CHECK	0x200
+#define BOOK3S_INTERRUPT_DATA_STORAGE	0x300
+#define BOOK3S_INTERRUPT_DATA_SEGMENT	0x380
+#define BOOK3S_INTERRUPT_INST_STORAGE	0x400
+#define BOOK3S_INTERRUPT_INST_SEGMENT	0x480
+#define BOOK3S_INTERRUPT_EXTERNAL	0x500
+#define BOOK3S_INTERRUPT_ALIGNMENT	0x600
+#define BOOK3S_INTERRUPT_PROGRAM	0x700
+#define BOOK3S_INTERRUPT_FP_UNAVAIL	0x800
+#define BOOK3S_INTERRUPT_DECREMENTER	0x900
+#define BOOK3S_INTERRUPT_SYSCALL	0xc00
+#define BOOK3S_INTERRUPT_TRACE		0xd00
+#define BOOK3S_INTERRUPT_PERFMON	0xf00
+#define BOOK3S_INTERRUPT_ALTIVEC	0xf20
+#define BOOK3S_INTERRUPT_VSX		0xf40
+
+#define BOOK3S_IRQPRIO_SYSTEM_RESET		0
+#define BOOK3S_IRQPRIO_DATA_SEGMENT		1
+#define BOOK3S_IRQPRIO_INST_SEGMENT		2
+#define BOOK3S_IRQPRIO_DATA_STORAGE		3
+#define BOOK3S_IRQPRIO_INST_STORAGE		4
+#define BOOK3S_IRQPRIO_ALIGNMENT		5
+#define BOOK3S_IRQPRIO_PROGRAM			6
+#define BOOK3S_IRQPRIO_FP_UNAVAIL		7
+#define BOOK3S_IRQPRIO_ALTIVEC			8
+#define BOOK3S_IRQPRIO_VSX			9
+#define BOOK3S_IRQPRIO_SYSCALL			10
+#define BOOK3S_IRQPRIO_MACHINE_CHECK		11
+#define BOOK3S_IRQPRIO_DEBUG			12
+#define BOOK3S_IRQPRIO_EXTERNAL			13
+#define BOOK3S_IRQPRIO_DECREMENTER		14
+#define BOOK3S_IRQPRIO_PERFORMANCE_MONITOR	15
+#define BOOK3S_IRQPRIO_MAX			16
+
+#define BOOK3S_HFLAG_DCBZ32			0x1
+
 #define RESUME_FLAG_NV          (1<<0)  /* Reload guest nonvolatile state? */
 #define RESUME_FLAG_HOST        (1<<1)  /* Resume host? */
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 04/27] Add Book3s fields to vcpu structs
  2009-10-21 15:03         ` Alexander Graf
@ 2009-10-21 15:03           ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

We need to store more information than we currently have for vcpus
when running on Book3s.

So let's extend the internal struct definitions.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - use context_id instead of mm_context

v4 -> v5:

  - always include pvr in vcpu struct
---
 arch/powerpc/include/asm/kvm_host.h |   73 ++++++++++++++++++++++++++++++++++-
 1 files changed, 72 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index c9c930e..2cff5fe 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -37,6 +37,8 @@
 #define KVM_NR_PAGE_SIZES	1
 #define KVM_PAGES_PER_HPAGE(x)	(1UL<<31)
 
+#define HPTEG_CACHE_NUM 1024
+
 struct kvm;
 struct kvm_run;
 struct kvm_vcpu;
@@ -63,6 +65,17 @@ struct kvm_vcpu_stat {
 	u32 dec_exits;
 	u32 ext_intr_exits;
 	u32 halt_wakeup;
+#ifdef CONFIG_PPC64
+	u32 pf_storage;
+	u32 pf_instruc;
+	u32 sp_storage;
+	u32 sp_instruc;
+	u32 queue_intr;
+	u32 ld;
+	u32 ld_slow;
+	u32 st;
+	u32 st_slow;
+#endif
 };
 
 enum kvm_exit_types {
@@ -109,9 +122,53 @@ struct kvmppc_exit_timing {
 struct kvm_arch {
 };
 
+struct kvmppc_pte {
+	u64 eaddr;
+	u64 vpage;
+	u64 raddr;
+	bool may_read;
+	bool may_write;
+	bool may_execute;
+};
+
+struct kvmppc_mmu {
+	/* book3s_64 only */
+	void (*slbmte)(struct kvm_vcpu *vcpu, u64 rb, u64 rs);
+	u64  (*slbmfee)(struct kvm_vcpu *vcpu, u64 slb_nr);
+	u64  (*slbmfev)(struct kvm_vcpu *vcpu, u64 slb_nr);
+	void (*slbie)(struct kvm_vcpu *vcpu, u64 slb_nr);
+	void (*slbia)(struct kvm_vcpu *vcpu);
+	/* book3s */
+	void (*mtsrin)(struct kvm_vcpu *vcpu, u32 srnum, ulong value);
+	u32  (*mfsrin)(struct kvm_vcpu *vcpu, u32 srnum);
+	int  (*xlate)(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *pte, bool data);
+	void (*reset_msr)(struct kvm_vcpu *vcpu);
+	void (*tlbie)(struct kvm_vcpu *vcpu, ulong addr, bool large);
+	int  (*esid_to_vsid)(struct kvm_vcpu *vcpu, u64 esid, u64 *vsid);
+	u64  (*ea_to_vp)(struct kvm_vcpu *vcpu, gva_t eaddr, bool data);
+	bool (*is_dcbz32)(struct kvm_vcpu *vcpu);
+};
+
+struct hpte_cache {
+	u64 host_va;
+	u64 pfn;
+	ulong slot;
+	struct kvmppc_pte pte;
+};
+
 struct kvm_vcpu_arch {
-	u32 host_stack;
+	ulong host_stack;
 	u32 host_pid;
+#ifdef CONFIG_PPC64
+	ulong host_msr;
+	ulong host_r2;
+	void *host_retip;
+	ulong trampoline_lowmem;
+	ulong trampoline_enter;
+	ulong highmem_handler;
+	ulong host_paca_phys;
+	struct kvmppc_mmu mmu;
+#endif
 
 	u64 fpr[32];
 	ulong gpr[32];
@@ -123,6 +180,10 @@ struct kvm_vcpu_arch {
 	ulong xer;
 
 	ulong msr;
+#ifdef CONFIG_PPC64
+	ulong shadow_msr;
+	ulong hflags;
+#endif
 	u32 mmucr;
 	ulong sprg0;
 	ulong sprg1;
@@ -149,6 +210,7 @@ struct kvm_vcpu_arch {
 	u32 ivor[64];
 	ulong ivpr;
 	u32 pir;
+	u32 pvr;
 
 	u32 shadow_pid;
 	u32 pid;
@@ -174,6 +236,9 @@ struct kvm_vcpu_arch {
 #endif
 
 	u32 last_inst;
+#ifdef CONFIG_PPC64
+	ulong fault_dsisr;
+#endif
 	ulong fault_dear;
 	ulong fault_esr;
 	gpa_t paddr_accessed;
@@ -186,7 +251,13 @@ struct kvm_vcpu_arch {
 	u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */
 
 	struct timer_list dec_timer;
+	u64 dec_jiffies;
 	unsigned long pending_exceptions;
+
+#ifdef CONFIG_PPC64
+	struct hpte_cache hpte_cache[HPTEG_CACHE_NUM];
+	int hpte_cache_offset;
+#endif
 };
 
 #endif /* __POWERPC_KVM_HOST_H__ */
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 04/27] Add Book3s fields to vcpu structs
@ 2009-10-21 15:03           ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

We need to store more information than we currently have for vcpus
when running on Book3s.

So let's extend the internal struct definitions.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - use context_id instead of mm_context

v4 -> v5:

  - always include pvr in vcpu struct
---
 arch/powerpc/include/asm/kvm_host.h |   73 ++++++++++++++++++++++++++++++++++-
 1 files changed, 72 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index c9c930e..2cff5fe 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -37,6 +37,8 @@
 #define KVM_NR_PAGE_SIZES	1
 #define KVM_PAGES_PER_HPAGE(x)	(1UL<<31)
 
+#define HPTEG_CACHE_NUM 1024
+
 struct kvm;
 struct kvm_run;
 struct kvm_vcpu;
@@ -63,6 +65,17 @@ struct kvm_vcpu_stat {
 	u32 dec_exits;
 	u32 ext_intr_exits;
 	u32 halt_wakeup;
+#ifdef CONFIG_PPC64
+	u32 pf_storage;
+	u32 pf_instruc;
+	u32 sp_storage;
+	u32 sp_instruc;
+	u32 queue_intr;
+	u32 ld;
+	u32 ld_slow;
+	u32 st;
+	u32 st_slow;
+#endif
 };
 
 enum kvm_exit_types {
@@ -109,9 +122,53 @@ struct kvmppc_exit_timing {
 struct kvm_arch {
 };
 
+struct kvmppc_pte {
+	u64 eaddr;
+	u64 vpage;
+	u64 raddr;
+	bool may_read;
+	bool may_write;
+	bool may_execute;
+};
+
+struct kvmppc_mmu {
+	/* book3s_64 only */
+	void (*slbmte)(struct kvm_vcpu *vcpu, u64 rb, u64 rs);
+	u64  (*slbmfee)(struct kvm_vcpu *vcpu, u64 slb_nr);
+	u64  (*slbmfev)(struct kvm_vcpu *vcpu, u64 slb_nr);
+	void (*slbie)(struct kvm_vcpu *vcpu, u64 slb_nr);
+	void (*slbia)(struct kvm_vcpu *vcpu);
+	/* book3s */
+	void (*mtsrin)(struct kvm_vcpu *vcpu, u32 srnum, ulong value);
+	u32  (*mfsrin)(struct kvm_vcpu *vcpu, u32 srnum);
+	int  (*xlate)(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *pte, bool data);
+	void (*reset_msr)(struct kvm_vcpu *vcpu);
+	void (*tlbie)(struct kvm_vcpu *vcpu, ulong addr, bool large);
+	int  (*esid_to_vsid)(struct kvm_vcpu *vcpu, u64 esid, u64 *vsid);
+	u64  (*ea_to_vp)(struct kvm_vcpu *vcpu, gva_t eaddr, bool data);
+	bool (*is_dcbz32)(struct kvm_vcpu *vcpu);
+};
+
+struct hpte_cache {
+	u64 host_va;
+	u64 pfn;
+	ulong slot;
+	struct kvmppc_pte pte;
+};
+
 struct kvm_vcpu_arch {
-	u32 host_stack;
+	ulong host_stack;
 	u32 host_pid;
+#ifdef CONFIG_PPC64
+	ulong host_msr;
+	ulong host_r2;
+	void *host_retip;
+	ulong trampoline_lowmem;
+	ulong trampoline_enter;
+	ulong highmem_handler;
+	ulong host_paca_phys;
+	struct kvmppc_mmu mmu;
+#endif
 
 	u64 fpr[32];
 	ulong gpr[32];
@@ -123,6 +180,10 @@ struct kvm_vcpu_arch {
 	ulong xer;
 
 	ulong msr;
+#ifdef CONFIG_PPC64
+	ulong shadow_msr;
+	ulong hflags;
+#endif
 	u32 mmucr;
 	ulong sprg0;
 	ulong sprg1;
@@ -149,6 +210,7 @@ struct kvm_vcpu_arch {
 	u32 ivor[64];
 	ulong ivpr;
 	u32 pir;
+	u32 pvr;
 
 	u32 shadow_pid;
 	u32 pid;
@@ -174,6 +236,9 @@ struct kvm_vcpu_arch {
 #endif
 
 	u32 last_inst;
+#ifdef CONFIG_PPC64
+	ulong fault_dsisr;
+#endif
 	ulong fault_dear;
 	ulong fault_esr;
 	gpa_t paddr_accessed;
@@ -186,7 +251,13 @@ struct kvm_vcpu_arch {
 	u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */
 
 	struct timer_list dec_timer;
+	u64 dec_jiffies;
 	unsigned long pending_exceptions;
+
+#ifdef CONFIG_PPC64
+	struct hpte_cache hpte_cache[HPTEG_CACHE_NUM];
+	int hpte_cache_offset;
+#endif
 };
 
 #endif /* __POWERPC_KVM_HOST_H__ */
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 05/27] Add asm/kvm_book3s.h
  2009-10-21 15:03           ` Alexander Graf
@ 2009-10-21 15:03             ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

This adds the book3s specific header file that contains structs that
are only valid on book3s specific code.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - use context_id instead of mm_alloc
---
 arch/powerpc/include/asm/kvm_book3s.h |  136 +++++++++++++++++++++++++++++++++
 1 files changed, 136 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s.h

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
new file mode 100644
index 0000000..c601133
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -0,0 +1,136 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#ifndef __ASM_KVM_BOOK3S_H__
+#define __ASM_KVM_BOOK3S_H__
+
+#include <linux/types.h>
+#include <linux/kvm_host.h>
+#include <asm/kvm_ppc.h>
+
+struct kvmppc_slb {
+	u64 esid;
+	u64 vsid;
+	u64 orige;
+	u64 origv;
+	bool valid;
+	bool Ks;
+	bool Kp;
+	bool nx;
+	bool large;
+	bool class;
+};
+
+struct kvmppc_sr {
+	u32 raw;
+	u32 vsid;
+	bool Ks;
+	bool Kp;
+	bool nx;
+};
+
+struct kvmppc_bat {
+	u32 bepi;
+	u32 bepi_mask;
+	bool vs;
+	bool vp;
+	u32 brpn;
+	u8 wimg;
+	u8 pp;
+};
+
+struct kvmppc_sid_map {
+	u64 guest_vsid;
+	u64 guest_esid;
+	u64 host_vsid;
+	bool valid;
+};
+
+#define SID_MAP_BITS    9
+#define SID_MAP_NUM     (1 << SID_MAP_BITS)
+#define SID_MAP_MASK    (SID_MAP_NUM - 1)
+
+struct kvmppc_vcpu_book3s {
+	struct kvm_vcpu vcpu;
+	struct kvmppc_sid_map sid_map[SID_MAP_NUM];
+	struct kvmppc_slb slb[64];
+	struct {
+		u64 esid;
+		u64 vsid;
+	} slb_shadow[64];
+	u8 slb_shadow_max;
+	struct kvmppc_sr sr[16];
+	struct kvmppc_bat ibat[8];
+	struct kvmppc_bat dbat[8];
+	u64 hid[6];
+	int slb_nr;
+	u64 sdr1;
+	u64 dsisr;
+	u64 hior;
+	u64 msr_mask;
+	u64 vsid_first;
+	u64 vsid_next;
+	u64 vsid_max;
+	int context_id;
+};
+
+#define CONTEXT_HOST		0
+#define CONTEXT_GUEST		1
+#define CONTEXT_GUEST_END	2
+
+#define VSID_REAL	0xfffffffffff00000
+#define VSID_REAL_DR	0xffffffffffe00000
+#define VSID_REAL_IR	0xffffffffffd00000
+#define VSID_BAT	0xffffffffffc00000
+#define VSID_PR		0x8000000000000000
+
+extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, u64 ea, u64 ea_mask);
+extern void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 vp, u64 vp_mask);
+extern void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, u64 pa_start, u64 pa_end);
+extern void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 new_msr);
+extern void kvmppc_mmu_book3s_64_init(struct kvm_vcpu *vcpu);
+extern void kvmppc_mmu_book3s_32_init(struct kvm_vcpu *vcpu);
+extern int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte);
+extern int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr);
+extern void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu);
+extern struct kvmppc_pte *kvmppc_mmu_find_pte(struct kvm_vcpu *vcpu, u64 ea, bool data);
+extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong eaddr, int size, void *ptr, bool data);
+extern int kvmppc_st(struct kvm_vcpu *vcpu, ulong eaddr, int size, void *ptr);
+extern void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec);
+
+extern u32 kvmppc_trampoline_lowmem;
+extern u32 kvmppc_trampoline_enter;
+
+static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
+{
+	return container_of(vcpu, struct kvmppc_vcpu_book3s, vcpu);
+}
+
+static inline ulong dsisr(void)
+{
+	ulong r;
+	asm ( "mfdsisr %0 " : "=r" (r) );
+	return r;
+}
+
+extern void kvm_return_point(void);
+
+#define INS_DCBZ			0x7c0007ec
+
+#endif /* __ASM_KVM_BOOK3S_H__ */
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 05/27] Add asm/kvm_book3s.h
@ 2009-10-21 15:03             ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

This adds the book3s specific header file that contains structs that
are only valid on book3s specific code.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - use context_id instead of mm_alloc
---
 arch/powerpc/include/asm/kvm_book3s.h |  136 +++++++++++++++++++++++++++++++++
 1 files changed, 136 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s.h

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
new file mode 100644
index 0000000..c601133
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -0,0 +1,136 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#ifndef __ASM_KVM_BOOK3S_H__
+#define __ASM_KVM_BOOK3S_H__
+
+#include <linux/types.h>
+#include <linux/kvm_host.h>
+#include <asm/kvm_ppc.h>
+
+struct kvmppc_slb {
+	u64 esid;
+	u64 vsid;
+	u64 orige;
+	u64 origv;
+	bool valid;
+	bool Ks;
+	bool Kp;
+	bool nx;
+	bool large;
+	bool class;
+};
+
+struct kvmppc_sr {
+	u32 raw;
+	u32 vsid;
+	bool Ks;
+	bool Kp;
+	bool nx;
+};
+
+struct kvmppc_bat {
+	u32 bepi;
+	u32 bepi_mask;
+	bool vs;
+	bool vp;
+	u32 brpn;
+	u8 wimg;
+	u8 pp;
+};
+
+struct kvmppc_sid_map {
+	u64 guest_vsid;
+	u64 guest_esid;
+	u64 host_vsid;
+	bool valid;
+};
+
+#define SID_MAP_BITS    9
+#define SID_MAP_NUM     (1 << SID_MAP_BITS)
+#define SID_MAP_MASK    (SID_MAP_NUM - 1)
+
+struct kvmppc_vcpu_book3s {
+	struct kvm_vcpu vcpu;
+	struct kvmppc_sid_map sid_map[SID_MAP_NUM];
+	struct kvmppc_slb slb[64];
+	struct {
+		u64 esid;
+		u64 vsid;
+	} slb_shadow[64];
+	u8 slb_shadow_max;
+	struct kvmppc_sr sr[16];
+	struct kvmppc_bat ibat[8];
+	struct kvmppc_bat dbat[8];
+	u64 hid[6];
+	int slb_nr;
+	u64 sdr1;
+	u64 dsisr;
+	u64 hior;
+	u64 msr_mask;
+	u64 vsid_first;
+	u64 vsid_next;
+	u64 vsid_max;
+	int context_id;
+};
+
+#define CONTEXT_HOST		0
+#define CONTEXT_GUEST		1
+#define CONTEXT_GUEST_END	2
+
+#define VSID_REAL	0xfffffffffff00000
+#define VSID_REAL_DR	0xffffffffffe00000
+#define VSID_REAL_IR	0xffffffffffd00000
+#define VSID_BAT	0xffffffffffc00000
+#define VSID_PR		0x8000000000000000
+
+extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, u64 ea, u64 ea_mask);
+extern void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 vp, u64 vp_mask);
+extern void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, u64 pa_start, u64 pa_end);
+extern void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 new_msr);
+extern void kvmppc_mmu_book3s_64_init(struct kvm_vcpu *vcpu);
+extern void kvmppc_mmu_book3s_32_init(struct kvm_vcpu *vcpu);
+extern int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte);
+extern int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr);
+extern void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu);
+extern struct kvmppc_pte *kvmppc_mmu_find_pte(struct kvm_vcpu *vcpu, u64 ea, bool data);
+extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong eaddr, int size, void *ptr, bool data);
+extern int kvmppc_st(struct kvm_vcpu *vcpu, ulong eaddr, int size, void *ptr);
+extern void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec);
+
+extern u32 kvmppc_trampoline_lowmem;
+extern u32 kvmppc_trampoline_enter;
+
+static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
+{
+	return container_of(vcpu, struct kvmppc_vcpu_book3s, vcpu);
+}
+
+static inline ulong dsisr(void)
+{
+	ulong r;
+	asm ( "mfdsisr %0 " : "=r" (r) );
+	return r;
+}
+
+extern void kvm_return_point(void);
+
+#define INS_DCBZ			0x7c0007ec
+
+#endif /* __ASM_KVM_BOOK3S_H__ */
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 06/27] Add Book3s_64 intercept helpers
  2009-10-21 15:03             ` Alexander Graf
@ 2009-10-21 15:03               ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

We need to intercept interrupt vectors. To do that, let's add a file
we can always include which only activates the intercepts when we have
then configured.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/kvm_book3s_64_asm.h |   58 ++++++++++++++++++++++++++
 1 files changed, 58 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_64_asm.h

diff --git a/arch/powerpc/include/asm/kvm_book3s_64_asm.h b/arch/powerpc/include/asm/kvm_book3s_64_asm.h
new file mode 100644
index 0000000..2e06ee8
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_book3s_64_asm.h
@@ -0,0 +1,58 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#ifndef __ASM_KVM_BOOK3S_ASM_H__
+#define __ASM_KVM_BOOK3S_ASM_H__
+
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+
+#include <asm/kvm_asm.h>
+
+.macro DO_KVM intno
+	.if (\intno == BOOK3S_INTERRUPT_SYSTEM_RESET) || \
+	    (\intno == BOOK3S_INTERRUPT_MACHINE_CHECK) || \
+	    (\intno == BOOK3S_INTERRUPT_DATA_STORAGE) || \
+	    (\intno == BOOK3S_INTERRUPT_INST_STORAGE) || \
+	    (\intno == BOOK3S_INTERRUPT_DATA_SEGMENT) || \
+	    (\intno == BOOK3S_INTERRUPT_INST_SEGMENT) || \
+	    (\intno == BOOK3S_INTERRUPT_EXTERNAL) || \
+	    (\intno == BOOK3S_INTERRUPT_ALIGNMENT) || \
+	    (\intno == BOOK3S_INTERRUPT_PROGRAM) || \
+	    (\intno == BOOK3S_INTERRUPT_FP_UNAVAIL) || \
+	    (\intno == BOOK3S_INTERRUPT_DECREMENTER) || \
+	    (\intno == BOOK3S_INTERRUPT_SYSCALL) || \
+	    (\intno == BOOK3S_INTERRUPT_TRACE) || \
+	    (\intno == BOOK3S_INTERRUPT_PERFMON) || \
+	    (\intno == BOOK3S_INTERRUPT_ALTIVEC) || \
+	    (\intno == BOOK3S_INTERRUPT_VSX)
+
+	b	kvmppc_trampoline_\intno
+kvmppc_resume_\intno:
+
+	.endif
+.endm
+
+#else
+
+.macro DO_KVM intno
+.endm
+
+#endif /* CONFIG_KVM_BOOK3S_64_HANDLER */
+
+#endif /* __ASM_KVM_BOOK3S_ASM_H__ */
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 06/27] Add Book3s_64 intercept helpers
@ 2009-10-21 15:03               ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

We need to intercept interrupt vectors. To do that, let's add a file
we can always include which only activates the intercepts when we have
then configured.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/kvm_book3s_64_asm.h |   58 ++++++++++++++++++++++++++
 1 files changed, 58 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_64_asm.h

diff --git a/arch/powerpc/include/asm/kvm_book3s_64_asm.h b/arch/powerpc/include/asm/kvm_book3s_64_asm.h
new file mode 100644
index 0000000..2e06ee8
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_book3s_64_asm.h
@@ -0,0 +1,58 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#ifndef __ASM_KVM_BOOK3S_ASM_H__
+#define __ASM_KVM_BOOK3S_ASM_H__
+
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+
+#include <asm/kvm_asm.h>
+
+.macro DO_KVM intno
+	.if (\intno = BOOK3S_INTERRUPT_SYSTEM_RESET) || \
+	    (\intno = BOOK3S_INTERRUPT_MACHINE_CHECK) || \
+	    (\intno = BOOK3S_INTERRUPT_DATA_STORAGE) || \
+	    (\intno = BOOK3S_INTERRUPT_INST_STORAGE) || \
+	    (\intno = BOOK3S_INTERRUPT_DATA_SEGMENT) || \
+	    (\intno = BOOK3S_INTERRUPT_INST_SEGMENT) || \
+	    (\intno = BOOK3S_INTERRUPT_EXTERNAL) || \
+	    (\intno = BOOK3S_INTERRUPT_ALIGNMENT) || \
+	    (\intno = BOOK3S_INTERRUPT_PROGRAM) || \
+	    (\intno = BOOK3S_INTERRUPT_FP_UNAVAIL) || \
+	    (\intno = BOOK3S_INTERRUPT_DECREMENTER) || \
+	    (\intno = BOOK3S_INTERRUPT_SYSCALL) || \
+	    (\intno = BOOK3S_INTERRUPT_TRACE) || \
+	    (\intno = BOOK3S_INTERRUPT_PERFMON) || \
+	    (\intno = BOOK3S_INTERRUPT_ALTIVEC) || \
+	    (\intno = BOOK3S_INTERRUPT_VSX)
+
+	b	kvmppc_trampoline_\intno
+kvmppc_resume_\intno:
+
+	.endif
+.endm
+
+#else
+
+.macro DO_KVM intno
+.endm
+
+#endif /* CONFIG_KVM_BOOK3S_64_HANDLER */
+
+#endif /* __ASM_KVM_BOOK3S_ASM_H__ */
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 07/27] Add book3s_64 highmem asm code
  2009-10-21 15:03               ` Alexander Graf
@ 2009-10-21 15:03                 ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

This is the of entry / exit code. In order to switch between host and guest
context, we need to switch register state and call the exit code handler on
exit.

This assembly file does exactly that. To finally enter the guest it calls
into book3s_64_slb.S. On exit it gets jumped at from book3s_64_slb.S too.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - header rename fix
---
 arch/powerpc/include/asm/kvm_ppc.h      |    1 +
 arch/powerpc/kvm/book3s_64_interrupts.S |  392 +++++++++++++++++++++++++++++++
 2 files changed, 393 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_interrupts.S

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 2c6ee34..269ee46 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -39,6 +39,7 @@ enum emulation_result {
 extern int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
 extern char kvmppc_handlers_start[];
 extern unsigned long kvmppc_handler_len;
+extern void kvmppc_handler_highmem(void);
 
 extern void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu);
 extern int kvmppc_handle_load(struct kvm_run *run, struct kvm_vcpu *vcpu,
diff --git a/arch/powerpc/kvm/book3s_64_interrupts.S b/arch/powerpc/kvm/book3s_64_interrupts.S
new file mode 100644
index 0000000..7b55d80
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_interrupts.S
@@ -0,0 +1,392 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <asm/ppc_asm.h>
+#include <asm/kvm_asm.h>
+#include <asm/reg.h>
+#include <asm/page.h>
+#include <asm/asm-offsets.h>
+#include <asm/exception-64s.h>
+
+#define KVMPPC_HANDLE_EXIT .kvmppc_handle_exit
+#define ULONG_SIZE 8
+#define VCPU_GPR(n)     (VCPU_GPRS + (n * ULONG_SIZE))
+
+.macro mfpaca tmp_reg, src_reg, offset, vcpu_reg
+	ld	\tmp_reg, (PACA_EXMC+\offset)(r13)
+	std	\tmp_reg, VCPU_GPR(\src_reg)(\vcpu_reg)
+.endm
+
+.macro DISABLE_INTERRUPTS
+       mfmsr   r0
+       rldicl  r0,r0,48,1
+       rotldi  r0,r0,16
+       mtmsrd  r0,1
+.endm
+
+/*****************************************************************************
+ *                                                                           *
+ *     Guest entry / exit code that is in kernel module memory (highmem)     *
+ *                                                                           *
+ ****************************************************************************/
+
+/* Registers:
+ *  r3: kvm_run pointer
+ *  r4: vcpu pointer
+ */
+_GLOBAL(__kvmppc_vcpu_entry)
+
+kvm_start_entry:
+	/* Write correct stack frame */
+	mflr    r0
+	std     r0,16(r1)
+
+	/* Save host state to the stack */
+	stdu	r1, -SWITCH_FRAME_SIZE(r1)
+
+	/* Save r3 (kvm_run) and r4 (vcpu) */
+	SAVE_2GPRS(3, r1)
+
+	/* Save non-volatile registers (r14 - r31) */
+	SAVE_NVGPRS(r1)
+
+	/* Save LR */
+	mflr	r14
+	std	r14, _LINK(r1)
+
+/* XXX optimize non-volatile loading away */
+kvm_start_lightweight:
+
+	DISABLE_INTERRUPTS
+
+	/* Save R1/R2 in the PACA */
+	std	r1, PACAR1(r13)
+	std	r2, (PACA_EXMC+EX_SRR0)(r13)
+	ld	r3, VCPU_HIGHMEM_HANDLER(r4)
+	std	r3, PACASAVEDMSR(r13)
+
+	/* Load non-volatile guest state from the vcpu */
+	ld	r14, VCPU_GPR(r14)(r4)
+	ld	r15, VCPU_GPR(r15)(r4)
+	ld	r16, VCPU_GPR(r16)(r4)
+	ld	r17, VCPU_GPR(r17)(r4)
+	ld	r18, VCPU_GPR(r18)(r4)
+	ld	r19, VCPU_GPR(r19)(r4)
+	ld	r20, VCPU_GPR(r20)(r4)
+	ld	r21, VCPU_GPR(r21)(r4)
+	ld	r22, VCPU_GPR(r22)(r4)
+	ld	r23, VCPU_GPR(r23)(r4)
+	ld	r24, VCPU_GPR(r24)(r4)
+	ld	r25, VCPU_GPR(r25)(r4)
+	ld	r26, VCPU_GPR(r26)(r4)
+	ld	r27, VCPU_GPR(r27)(r4)
+	ld	r28, VCPU_GPR(r28)(r4)
+	ld	r29, VCPU_GPR(r29)(r4)
+	ld	r30, VCPU_GPR(r30)(r4)
+	ld	r31, VCPU_GPR(r31)(r4)
+
+	ld	r9, VCPU_PC(r4)			/* r9 = vcpu->arch.pc */
+	ld	r10, VCPU_SHADOW_MSR(r4)	/* r10 = vcpu->arch.shadow_msr */
+
+	ld	r3, VCPU_TRAMPOLINE_ENTER(r4)
+	mtsrr0	r3
+
+	LOAD_REG_IMMEDIATE(r3, MSR_KERNEL & ~(MSR_IR | MSR_DR))
+	mtsrr1	r3
+
+	/* Load guest state in the respective registers */
+	lwz	r3, VCPU_CR(r4)		/* r3 = vcpu->arch.cr */
+	stw	r3, (PACA_EXMC + EX_CCR)(r13)
+
+	ld	r3, VCPU_CTR(r4)	/* r3 = vcpu->arch.ctr */
+	mtctr	r3			/* CTR = r3 */
+
+	ld	r3, VCPU_LR(r4)		/* r3 = vcpu->arch.lr */
+	mtlr	r3			/* LR = r3 */
+
+	ld	r3, VCPU_XER(r4)	/* r3 = vcpu->arch.xer */
+	std	r3, (PACA_EXMC + EX_R3)(r13)
+
+	/* Some guests may need to have dcbz set to 32 byte length.
+	 *
+	 * Usually we ensure that by patching the guest's instructions
+	 * to trap on dcbz and emulate it in the hypervisor.
+	 *
+	 * If we can, we should tell the CPU to use 32 byte dcbz though,
+	 * because that's a lot faster.
+	 */
+
+	ld	r3, VCPU_HFLAGS(r4)
+	rldicl.	r3, r3, 0, 63		/* CR = ((r3 & 1) == 0) */
+	beq	no_dcbz32_on
+
+	mfspr   r3,SPRN_HID5
+	ori     r3, r3, 0x80		/* XXX HID5_dcbz32 = 0x80 */
+	mtspr   SPRN_HID5,r3
+
+no_dcbz32_on:
+	/*	Load guest GPRs */
+
+	ld	r3, VCPU_GPR(r9)(r4)
+	std	r3, (PACA_EXMC + EX_R9)(r13)
+	ld	r3, VCPU_GPR(r10)(r4)
+	std	r3, (PACA_EXMC + EX_R10)(r13)
+	ld	r3, VCPU_GPR(r11)(r4)
+	std	r3, (PACA_EXMC + EX_R11)(r13)
+	ld	r3, VCPU_GPR(r12)(r4)
+	std	r3, (PACA_EXMC + EX_R12)(r13)
+	ld	r3, VCPU_GPR(r13)(r4)
+	std	r3, (PACA_EXMC + EX_R13)(r13)
+
+	ld	r0, VCPU_GPR(r0)(r4)
+	ld	r1, VCPU_GPR(r1)(r4)
+	ld	r2, VCPU_GPR(r2)(r4)
+	ld	r3, VCPU_GPR(r3)(r4)
+	ld	r5, VCPU_GPR(r5)(r4)
+	ld	r6, VCPU_GPR(r6)(r4)
+	ld	r7, VCPU_GPR(r7)(r4)
+	ld	r8, VCPU_GPR(r8)(r4)
+	ld	r4, VCPU_GPR(r4)(r4)
+
+	/* This sets the Magic value for the trampoline */
+
+	li	r11, 1
+	stb	r11, PACA_KVM_IN_GUEST(r13)
+
+	/* Jump to SLB patching handlder and into our guest */
+	RFI
+
+/*
+ * This is the handler in module memory. It gets jumped at from the
+ * lowmem trampoline code, so it's basically the guest exit code.
+ *
+ */
+
+.global kvmppc_handler_highmem
+kvmppc_handler_highmem:
+
+	/*
+	 * Register usage at this point:
+	 *
+	 * R00   = guest R13
+	 * R01   = host R1
+	 * R02   = host R2
+	 * R10   = guest PC
+	 * R11   = guest MSR
+	 * R12   = exit handler id
+	 * R13   = PACA
+	 * PACA.exmc.R9    = guest R1
+	 * PACA.exmc.R10   = guest R10
+	 * PACA.exmc.R11   = guest R11
+	 * PACA.exmc.R12   = guest R12
+	 * PACA.exmc.R13   = guest R2
+	 * PACA.exmc.DAR   = guest DAR
+	 * PACA.exmc.DSISR = guest DSISR
+	 * PACA.exmc.LR    = guest instruction
+	 * PACA.exmc.CCR   = guest CR
+	 * PACA.exmc.SRR0  = guest R0
+	 *
+	 */
+
+	std	r3, (PACA_EXMC+EX_R3)(r13)
+
+	/* save the exit id in R3 */
+	mr	r3, r12
+
+	/* R12 = vcpu */
+	ld	r12, GPR4(r1)
+
+	/* Now save the guest state */
+
+	std	r0, VCPU_GPR(r13)(r12)
+	std	r4, VCPU_GPR(r4)(r12)
+	std	r5, VCPU_GPR(r5)(r12)
+	std	r6, VCPU_GPR(r6)(r12)
+	std	r7, VCPU_GPR(r7)(r12)
+	std	r8, VCPU_GPR(r8)(r12)
+	std	r9, VCPU_GPR(r9)(r12)
+
+	/* get registers from PACA */
+	mfpaca	r5, r0, EX_SRR0, r12
+	mfpaca	r5, r3, EX_R3, r12
+	mfpaca	r5, r1, EX_R9, r12
+	mfpaca	r5, r10, EX_R10, r12
+	mfpaca	r5, r11, EX_R11, r12
+	mfpaca	r5, r12, EX_R12, r12
+	mfpaca	r5, r2, EX_R13, r12
+
+	lwz	r5, (PACA_EXMC+EX_LR)(r13)
+	stw	r5, VCPU_LAST_INST(r12)
+
+	lwz	r5, (PACA_EXMC+EX_CCR)(r13)
+	stw	r5, VCPU_CR(r12)
+
+	ld	r5, VCPU_HFLAGS(r12)
+	rldicl.	r5, r5, 0, 63		/* CR = ((r5 & 1) == 0) */
+	beq	no_dcbz32_off
+
+	mfspr   r5,SPRN_HID5
+	rldimi  r5,r5,6,56
+	mtspr   SPRN_HID5,r5
+
+no_dcbz32_off:
+
+	/* XXX maybe skip on lightweight? */
+	std	r14, VCPU_GPR(r14)(r12)
+	std	r15, VCPU_GPR(r15)(r12)
+	std	r16, VCPU_GPR(r16)(r12)
+	std	r17, VCPU_GPR(r17)(r12)
+	std	r18, VCPU_GPR(r18)(r12)
+	std	r19, VCPU_GPR(r19)(r12)
+	std	r20, VCPU_GPR(r20)(r12)
+	std	r21, VCPU_GPR(r21)(r12)
+	std	r22, VCPU_GPR(r22)(r12)
+	std	r23, VCPU_GPR(r23)(r12)
+	std	r24, VCPU_GPR(r24)(r12)
+	std	r25, VCPU_GPR(r25)(r12)
+	std	r26, VCPU_GPR(r26)(r12)
+	std	r27, VCPU_GPR(r27)(r12)
+	std	r28, VCPU_GPR(r28)(r12)
+	std	r29, VCPU_GPR(r29)(r12)
+	std	r30, VCPU_GPR(r30)(r12)
+	std	r31, VCPU_GPR(r31)(r12)
+
+	/* Restore non-volatile host registers (r14 - r31) */
+	REST_NVGPRS(r1)
+
+	/* Save guest PC (R10) */
+	std	r10, VCPU_PC(r12)
+
+	/* Save guest msr (R11) */
+	std	r11, VCPU_SHADOW_MSR(r12)
+
+	/* Save guest CTR (in R12) */
+	mfctr	r5
+	std	r5, VCPU_CTR(r12)
+
+	/* Save guest LR */
+	mflr	r5
+	std	r5, VCPU_LR(r12)
+
+	/* Save guest XER */
+	mfxer	r5
+	std	r5, VCPU_XER(r12)
+
+	/* Save guest DAR */
+	ld	r5, (PACA_EXMC+EX_DAR)(r13)
+	std	r5, VCPU_FAULT_DEAR(r12)
+
+	/* Save guest DSISR */
+	lwz	r5, (PACA_EXMC+EX_DSISR)(r13)
+	std	r5, VCPU_FAULT_DSISR(r12)
+
+	/* Restore host msr -> SRR1 */
+	ld	r7, VCPU_HOST_MSR(r12)
+	mtsrr1	r7
+
+	/* Restore host IP -> SRR0 */
+	ld	r6, VCPU_HOST_RETIP(r12)
+	mtsrr0	r6
+
+	/*
+	 * For some interrupts, we need to call the real Linux
+	 * handler, so it can do work for us. This has to happen
+	 * as if the interrupt arrived from the kernel though,
+	 * so let's fake it here where most state is restored.
+	 *
+	 * Call Linux for hardware interrupts/decrementer
+	 * r3 = address of interrupt handler (exit reason)
+	 */
+
+	cmpwi	r3, BOOK3S_INTERRUPT_EXTERNAL
+	beq	call_linux_handler
+	cmpwi	r3, BOOK3S_INTERRUPT_DECREMENTER
+	beq	call_linux_handler
+
+	/* Back to Interruptable Mode! (goto kvm_return_point) */
+	RFI
+
+call_linux_handler:
+
+	/*
+	 * If we land here we need to jump back to the handler we
+	 * came from.
+	 *
+	 * We have a page that we can access from real mode, so let's
+	 * jump back to that and use it as a trampoline to get back into the
+	 * interrupt handler!
+	 *
+	 * R3 still contains the exit code,
+	 * R6 VCPU_HOST_RETIP and
+	 * R7 VCPU_HOST_MSR
+	 */
+
+	mtlr	r3
+
+	ld	r5, VCPU_TRAMPOLINE_LOWMEM(r12)
+	mtsrr0	r5
+	LOAD_REG_IMMEDIATE(r5, MSR_KERNEL & ~(MSR_IR | MSR_DR))
+	mtsrr1	r5
+
+	RFI
+
+.global kvm_return_point
+kvm_return_point:
+
+	/* Jump back to lightweight entry if we're supposed to */
+	/* go back into the guest */
+	mr	r5, r3
+	/* Restore r3 (kvm_run) and r4 (vcpu) */
+	REST_2GPRS(3, r1)
+	bl	KVMPPC_HANDLE_EXIT
+
+#if 0 /* XXX get lightweight exits back */
+	cmpwi	r3, RESUME_GUEST
+	bne	kvm_exit_heavyweight
+
+	/* put VCPU and KVM_RUN back into place and roll again! */
+	REST_2GPRS(3, r1)
+	b	kvm_start_lightweight
+
+kvm_exit_heavyweight:
+	/* Restore non-volatile host registers */
+	ld	r14, _LINK(r1)
+	mtlr	r14
+	REST_NVGPRS(r1)
+
+	addi    r1, r1, SWITCH_FRAME_SIZE
+#else
+	ld	r4, _LINK(r1)
+	mtlr	r4
+
+	cmpwi	r3, RESUME_GUEST
+	bne	kvm_exit_heavyweight
+
+	REST_2GPRS(3, r1)
+
+	addi    r1, r1, SWITCH_FRAME_SIZE
+
+	b	kvm_start_entry
+
+kvm_exit_heavyweight:
+
+	addi    r1, r1, SWITCH_FRAME_SIZE
+#endif
+
+	blr
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 07/27] Add book3s_64 highmem asm code
@ 2009-10-21 15:03                 ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

This is the of entry / exit code. In order to switch between host and guest
context, we need to switch register state and call the exit code handler on
exit.

This assembly file does exactly that. To finally enter the guest it calls
into book3s_64_slb.S. On exit it gets jumped at from book3s_64_slb.S too.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - header rename fix
---
 arch/powerpc/include/asm/kvm_ppc.h      |    1 +
 arch/powerpc/kvm/book3s_64_interrupts.S |  392 +++++++++++++++++++++++++++++++
 2 files changed, 393 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_interrupts.S

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 2c6ee34..269ee46 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -39,6 +39,7 @@ enum emulation_result {
 extern int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
 extern char kvmppc_handlers_start[];
 extern unsigned long kvmppc_handler_len;
+extern void kvmppc_handler_highmem(void);
 
 extern void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu);
 extern int kvmppc_handle_load(struct kvm_run *run, struct kvm_vcpu *vcpu,
diff --git a/arch/powerpc/kvm/book3s_64_interrupts.S b/arch/powerpc/kvm/book3s_64_interrupts.S
new file mode 100644
index 0000000..7b55d80
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_interrupts.S
@@ -0,0 +1,392 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <asm/ppc_asm.h>
+#include <asm/kvm_asm.h>
+#include <asm/reg.h>
+#include <asm/page.h>
+#include <asm/asm-offsets.h>
+#include <asm/exception-64s.h>
+
+#define KVMPPC_HANDLE_EXIT .kvmppc_handle_exit
+#define ULONG_SIZE 8
+#define VCPU_GPR(n)     (VCPU_GPRS + (n * ULONG_SIZE))
+
+.macro mfpaca tmp_reg, src_reg, offset, vcpu_reg
+	ld	\tmp_reg, (PACA_EXMC+\offset)(r13)
+	std	\tmp_reg, VCPU_GPR(\src_reg)(\vcpu_reg)
+.endm
+
+.macro DISABLE_INTERRUPTS
+       mfmsr   r0
+       rldicl  r0,r0,48,1
+       rotldi  r0,r0,16
+       mtmsrd  r0,1
+.endm
+
+/*****************************************************************************
+ *                                                                           *
+ *     Guest entry / exit code that is in kernel module memory (highmem)     *
+ *                                                                           *
+ ****************************************************************************/
+
+/* Registers:
+ *  r3: kvm_run pointer
+ *  r4: vcpu pointer
+ */
+_GLOBAL(__kvmppc_vcpu_entry)
+
+kvm_start_entry:
+	/* Write correct stack frame */
+	mflr    r0
+	std     r0,16(r1)
+
+	/* Save host state to the stack */
+	stdu	r1, -SWITCH_FRAME_SIZE(r1)
+
+	/* Save r3 (kvm_run) and r4 (vcpu) */
+	SAVE_2GPRS(3, r1)
+
+	/* Save non-volatile registers (r14 - r31) */
+	SAVE_NVGPRS(r1)
+
+	/* Save LR */
+	mflr	r14
+	std	r14, _LINK(r1)
+
+/* XXX optimize non-volatile loading away */
+kvm_start_lightweight:
+
+	DISABLE_INTERRUPTS
+
+	/* Save R1/R2 in the PACA */
+	std	r1, PACAR1(r13)
+	std	r2, (PACA_EXMC+EX_SRR0)(r13)
+	ld	r3, VCPU_HIGHMEM_HANDLER(r4)
+	std	r3, PACASAVEDMSR(r13)
+
+	/* Load non-volatile guest state from the vcpu */
+	ld	r14, VCPU_GPR(r14)(r4)
+	ld	r15, VCPU_GPR(r15)(r4)
+	ld	r16, VCPU_GPR(r16)(r4)
+	ld	r17, VCPU_GPR(r17)(r4)
+	ld	r18, VCPU_GPR(r18)(r4)
+	ld	r19, VCPU_GPR(r19)(r4)
+	ld	r20, VCPU_GPR(r20)(r4)
+	ld	r21, VCPU_GPR(r21)(r4)
+	ld	r22, VCPU_GPR(r22)(r4)
+	ld	r23, VCPU_GPR(r23)(r4)
+	ld	r24, VCPU_GPR(r24)(r4)
+	ld	r25, VCPU_GPR(r25)(r4)
+	ld	r26, VCPU_GPR(r26)(r4)
+	ld	r27, VCPU_GPR(r27)(r4)
+	ld	r28, VCPU_GPR(r28)(r4)
+	ld	r29, VCPU_GPR(r29)(r4)
+	ld	r30, VCPU_GPR(r30)(r4)
+	ld	r31, VCPU_GPR(r31)(r4)
+
+	ld	r9, VCPU_PC(r4)			/* r9 = vcpu->arch.pc */
+	ld	r10, VCPU_SHADOW_MSR(r4)	/* r10 = vcpu->arch.shadow_msr */
+
+	ld	r3, VCPU_TRAMPOLINE_ENTER(r4)
+	mtsrr0	r3
+
+	LOAD_REG_IMMEDIATE(r3, MSR_KERNEL & ~(MSR_IR | MSR_DR))
+	mtsrr1	r3
+
+	/* Load guest state in the respective registers */
+	lwz	r3, VCPU_CR(r4)		/* r3 = vcpu->arch.cr */
+	stw	r3, (PACA_EXMC + EX_CCR)(r13)
+
+	ld	r3, VCPU_CTR(r4)	/* r3 = vcpu->arch.ctr */
+	mtctr	r3			/* CTR = r3 */
+
+	ld	r3, VCPU_LR(r4)		/* r3 = vcpu->arch.lr */
+	mtlr	r3			/* LR = r3 */
+
+	ld	r3, VCPU_XER(r4)	/* r3 = vcpu->arch.xer */
+	std	r3, (PACA_EXMC + EX_R3)(r13)
+
+	/* Some guests may need to have dcbz set to 32 byte length.
+	 *
+	 * Usually we ensure that by patching the guest's instructions
+	 * to trap on dcbz and emulate it in the hypervisor.
+	 *
+	 * If we can, we should tell the CPU to use 32 byte dcbz though,
+	 * because that's a lot faster.
+	 */
+
+	ld	r3, VCPU_HFLAGS(r4)
+	rldicl.	r3, r3, 0, 63		/* CR = ((r3 & 1) = 0) */
+	beq	no_dcbz32_on
+
+	mfspr   r3,SPRN_HID5
+	ori     r3, r3, 0x80		/* XXX HID5_dcbz32 = 0x80 */
+	mtspr   SPRN_HID5,r3
+
+no_dcbz32_on:
+	/*	Load guest GPRs */
+
+	ld	r3, VCPU_GPR(r9)(r4)
+	std	r3, (PACA_EXMC + EX_R9)(r13)
+	ld	r3, VCPU_GPR(r10)(r4)
+	std	r3, (PACA_EXMC + EX_R10)(r13)
+	ld	r3, VCPU_GPR(r11)(r4)
+	std	r3, (PACA_EXMC + EX_R11)(r13)
+	ld	r3, VCPU_GPR(r12)(r4)
+	std	r3, (PACA_EXMC + EX_R12)(r13)
+	ld	r3, VCPU_GPR(r13)(r4)
+	std	r3, (PACA_EXMC + EX_R13)(r13)
+
+	ld	r0, VCPU_GPR(r0)(r4)
+	ld	r1, VCPU_GPR(r1)(r4)
+	ld	r2, VCPU_GPR(r2)(r4)
+	ld	r3, VCPU_GPR(r3)(r4)
+	ld	r5, VCPU_GPR(r5)(r4)
+	ld	r6, VCPU_GPR(r6)(r4)
+	ld	r7, VCPU_GPR(r7)(r4)
+	ld	r8, VCPU_GPR(r8)(r4)
+	ld	r4, VCPU_GPR(r4)(r4)
+
+	/* This sets the Magic value for the trampoline */
+
+	li	r11, 1
+	stb	r11, PACA_KVM_IN_GUEST(r13)
+
+	/* Jump to SLB patching handlder and into our guest */
+	RFI
+
+/*
+ * This is the handler in module memory. It gets jumped at from the
+ * lowmem trampoline code, so it's basically the guest exit code.
+ *
+ */
+
+.global kvmppc_handler_highmem
+kvmppc_handler_highmem:
+
+	/*
+	 * Register usage at this point:
+	 *
+	 * R00   = guest R13
+	 * R01   = host R1
+	 * R02   = host R2
+	 * R10   = guest PC
+	 * R11   = guest MSR
+	 * R12   = exit handler id
+	 * R13   = PACA
+	 * PACA.exmc.R9    = guest R1
+	 * PACA.exmc.R10   = guest R10
+	 * PACA.exmc.R11   = guest R11
+	 * PACA.exmc.R12   = guest R12
+	 * PACA.exmc.R13   = guest R2
+	 * PACA.exmc.DAR   = guest DAR
+	 * PACA.exmc.DSISR = guest DSISR
+	 * PACA.exmc.LR    = guest instruction
+	 * PACA.exmc.CCR   = guest CR
+	 * PACA.exmc.SRR0  = guest R0
+	 *
+	 */
+
+	std	r3, (PACA_EXMC+EX_R3)(r13)
+
+	/* save the exit id in R3 */
+	mr	r3, r12
+
+	/* R12 = vcpu */
+	ld	r12, GPR4(r1)
+
+	/* Now save the guest state */
+
+	std	r0, VCPU_GPR(r13)(r12)
+	std	r4, VCPU_GPR(r4)(r12)
+	std	r5, VCPU_GPR(r5)(r12)
+	std	r6, VCPU_GPR(r6)(r12)
+	std	r7, VCPU_GPR(r7)(r12)
+	std	r8, VCPU_GPR(r8)(r12)
+	std	r9, VCPU_GPR(r9)(r12)
+
+	/* get registers from PACA */
+	mfpaca	r5, r0, EX_SRR0, r12
+	mfpaca	r5, r3, EX_R3, r12
+	mfpaca	r5, r1, EX_R9, r12
+	mfpaca	r5, r10, EX_R10, r12
+	mfpaca	r5, r11, EX_R11, r12
+	mfpaca	r5, r12, EX_R12, r12
+	mfpaca	r5, r2, EX_R13, r12
+
+	lwz	r5, (PACA_EXMC+EX_LR)(r13)
+	stw	r5, VCPU_LAST_INST(r12)
+
+	lwz	r5, (PACA_EXMC+EX_CCR)(r13)
+	stw	r5, VCPU_CR(r12)
+
+	ld	r5, VCPU_HFLAGS(r12)
+	rldicl.	r5, r5, 0, 63		/* CR = ((r5 & 1) = 0) */
+	beq	no_dcbz32_off
+
+	mfspr   r5,SPRN_HID5
+	rldimi  r5,r5,6,56
+	mtspr   SPRN_HID5,r5
+
+no_dcbz32_off:
+
+	/* XXX maybe skip on lightweight? */
+	std	r14, VCPU_GPR(r14)(r12)
+	std	r15, VCPU_GPR(r15)(r12)
+	std	r16, VCPU_GPR(r16)(r12)
+	std	r17, VCPU_GPR(r17)(r12)
+	std	r18, VCPU_GPR(r18)(r12)
+	std	r19, VCPU_GPR(r19)(r12)
+	std	r20, VCPU_GPR(r20)(r12)
+	std	r21, VCPU_GPR(r21)(r12)
+	std	r22, VCPU_GPR(r22)(r12)
+	std	r23, VCPU_GPR(r23)(r12)
+	std	r24, VCPU_GPR(r24)(r12)
+	std	r25, VCPU_GPR(r25)(r12)
+	std	r26, VCPU_GPR(r26)(r12)
+	std	r27, VCPU_GPR(r27)(r12)
+	std	r28, VCPU_GPR(r28)(r12)
+	std	r29, VCPU_GPR(r29)(r12)
+	std	r30, VCPU_GPR(r30)(r12)
+	std	r31, VCPU_GPR(r31)(r12)
+
+	/* Restore non-volatile host registers (r14 - r31) */
+	REST_NVGPRS(r1)
+
+	/* Save guest PC (R10) */
+	std	r10, VCPU_PC(r12)
+
+	/* Save guest msr (R11) */
+	std	r11, VCPU_SHADOW_MSR(r12)
+
+	/* Save guest CTR (in R12) */
+	mfctr	r5
+	std	r5, VCPU_CTR(r12)
+
+	/* Save guest LR */
+	mflr	r5
+	std	r5, VCPU_LR(r12)
+
+	/* Save guest XER */
+	mfxer	r5
+	std	r5, VCPU_XER(r12)
+
+	/* Save guest DAR */
+	ld	r5, (PACA_EXMC+EX_DAR)(r13)
+	std	r5, VCPU_FAULT_DEAR(r12)
+
+	/* Save guest DSISR */
+	lwz	r5, (PACA_EXMC+EX_DSISR)(r13)
+	std	r5, VCPU_FAULT_DSISR(r12)
+
+	/* Restore host msr -> SRR1 */
+	ld	r7, VCPU_HOST_MSR(r12)
+	mtsrr1	r7
+
+	/* Restore host IP -> SRR0 */
+	ld	r6, VCPU_HOST_RETIP(r12)
+	mtsrr0	r6
+
+	/*
+	 * For some interrupts, we need to call the real Linux
+	 * handler, so it can do work for us. This has to happen
+	 * as if the interrupt arrived from the kernel though,
+	 * so let's fake it here where most state is restored.
+	 *
+	 * Call Linux for hardware interrupts/decrementer
+	 * r3 = address of interrupt handler (exit reason)
+	 */
+
+	cmpwi	r3, BOOK3S_INTERRUPT_EXTERNAL
+	beq	call_linux_handler
+	cmpwi	r3, BOOK3S_INTERRUPT_DECREMENTER
+	beq	call_linux_handler
+
+	/* Back to Interruptable Mode! (goto kvm_return_point) */
+	RFI
+
+call_linux_handler:
+
+	/*
+	 * If we land here we need to jump back to the handler we
+	 * came from.
+	 *
+	 * We have a page that we can access from real mode, so let's
+	 * jump back to that and use it as a trampoline to get back into the
+	 * interrupt handler!
+	 *
+	 * R3 still contains the exit code,
+	 * R6 VCPU_HOST_RETIP and
+	 * R7 VCPU_HOST_MSR
+	 */
+
+	mtlr	r3
+
+	ld	r5, VCPU_TRAMPOLINE_LOWMEM(r12)
+	mtsrr0	r5
+	LOAD_REG_IMMEDIATE(r5, MSR_KERNEL & ~(MSR_IR | MSR_DR))
+	mtsrr1	r5
+
+	RFI
+
+.global kvm_return_point
+kvm_return_point:
+
+	/* Jump back to lightweight entry if we're supposed to */
+	/* go back into the guest */
+	mr	r5, r3
+	/* Restore r3 (kvm_run) and r4 (vcpu) */
+	REST_2GPRS(3, r1)
+	bl	KVMPPC_HANDLE_EXIT
+
+#if 0 /* XXX get lightweight exits back */
+	cmpwi	r3, RESUME_GUEST
+	bne	kvm_exit_heavyweight
+
+	/* put VCPU and KVM_RUN back into place and roll again! */
+	REST_2GPRS(3, r1)
+	b	kvm_start_lightweight
+
+kvm_exit_heavyweight:
+	/* Restore non-volatile host registers */
+	ld	r14, _LINK(r1)
+	mtlr	r14
+	REST_NVGPRS(r1)
+
+	addi    r1, r1, SWITCH_FRAME_SIZE
+#else
+	ld	r4, _LINK(r1)
+	mtlr	r4
+
+	cmpwi	r3, RESUME_GUEST
+	bne	kvm_exit_heavyweight
+
+	REST_2GPRS(3, r1)
+
+	addi    r1, r1, SWITCH_FRAME_SIZE
+
+	b	kvm_start_entry
+
+kvm_exit_heavyweight:
+
+	addi    r1, r1, SWITCH_FRAME_SIZE
+#endif
+
+	blr
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 08/27] Add SLB switching code for entry/exit
       [not found]                 ` <1256137413-15256-8-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-21 15:03                     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

This is the really low level of guest entry/exit code.

Book3s_64 has an SLB, which stores all ESID -> VSID mappings we're
currently aware of.

The segments in the guest differ from the ones on the host, so we need
to switch the SLB to tell the MMU that we're in a new context.

So we store a shadow of the guest's SLB in the PACA, switch to that on
entry and only restore bolted entries on exit, leaving the rest to the
Linux SLB fault handler.

That way we get a really clean way of switching the SLB.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
---
 arch/powerpc/kvm/book3s_64_slb.S |  277 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 277 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_slb.S

diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/book3s_64_slb.S
new file mode 100644
index 0000000..00a8367
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_slb.S
@@ -0,0 +1,277 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
+ */
+
+/******************************************************************************
+ *                                                                            *
+ *                               Entry code                                   *
+ *                                                                            *
+ *****************************************************************************/
+
+.global kvmppc_handler_trampoline_enter
+kvmppc_handler_trampoline_enter:
+
+	/* Required state:
+	 *
+	 * MSR = ~IR|DR
+	 * R13 = PACA
+	 * R9 = guest IP
+	 * R10 = guest MSR
+	 * R11 = free
+	 * R12 = free
+	 * PACA[PACA_EXMC + EX_R9] = guest R9
+	 * PACA[PACA_EXMC + EX_R10] = guest R10
+	 * PACA[PACA_EXMC + EX_R11] = guest R11
+	 * PACA[PACA_EXMC + EX_R12] = guest R12
+	 * PACA[PACA_EXMC + EX_R13] = guest R13
+	 * PACA[PACA_EXMC + EX_CCR] = guest CR
+	 * PACA[PACA_EXMC + EX_R3] = guest XER
+	 */
+
+	mtsrr0	r9
+	mtsrr1	r10
+
+	mtspr	SPRN_SPRG_SCRATCH0, r0
+
+	/* Remove LPAR shadow entries */
+
+#if SLB_NUM_BOLTED == 3
+
+	ld	r12, PACA_SLBSHADOWPTR(r13)
+	ld	r10, 0x10(r12)
+	ld	r11, 0x18(r12)
+	/* Invalid? Skip. */
+	rldicl. r0, r10, 37, 63
+	beq	slb_entry_skip_1
+	xoris	r9, r10, SLB_ESID_V@h
+	std	r9, 0x10(r12)
+slb_entry_skip_1:
+	ld	r9, 0x20(r12)
+	/* Invalid? Skip. */
+	rldicl. r0, r9, 37, 63
+	beq	slb_entry_skip_2
+	xoris	r9, r9, SLB_ESID_V@h
+	std	r9, 0x20(r12)
+slb_entry_skip_2:
+	ld	r9, 0x30(r12)
+	/* Invalid? Skip. */
+	rldicl. r0, r9, 37, 63
+	beq	slb_entry_skip_3
+	xoris	r9, r9, SLB_ESID_V@h
+	std	r9, 0x30(r12)
+slb_entry_skip_3:
+	
+#else
+#error unknown number of bolted entries
+#endif
+
+	/* Flush SLB */
+
+	slbia
+
+	/* r0 = esid & ESID_MASK */
+	rldicr  r10, r10, 0, 35
+	/* r0 |= CLASS_BIT(VSID) */
+	rldic   r12, r11, 56 - 36, 36
+	or      r10, r10, r12
+	slbie	r10
+
+	isync
+
+	/* Fill SLB with our shadow */
+
+	lbz	r12, PACA_KVM_SLB_MAX(r13)
+	mulli	r12, r12, 16
+	addi	r12, r12, PACA_KVM_SLB
+	add	r12, r12, r13
+
+	/* for (r11 = kvm_slb; r11 < kvm_slb + kvm_slb_size; r11+=slb_entry) */
+	li	r11, PACA_KVM_SLB
+	add	r11, r11, r13
+
+slb_loop_enter:
+
+	ld	r10, 0(r11)
+
+	rldicl. r0, r10, 37, 63
+	beq	slb_loop_enter_skip
+
+	ld	r9, 8(r11)
+	slbmte	r9, r10
+
+slb_loop_enter_skip:
+	addi	r11, r11, 16
+	cmpd	cr0, r11, r12
+	blt	slb_loop_enter
+
+slb_do_enter:
+
+	/* Enter guest */
+
+	mfspr	r0, SPRN_SPRG_SCRATCH0
+
+	ld	r9, (PACA_EXMC+EX_R9)(r13)
+	ld	r10, (PACA_EXMC+EX_R10)(r13)
+	ld	r12, (PACA_EXMC+EX_R12)(r13)
+
+	lwz	r11, (PACA_EXMC+EX_CCR)(r13)
+	mtcr	r11
+
+	ld	r11, (PACA_EXMC+EX_R3)(r13)
+	mtxer	r11
+
+	ld	r11, (PACA_EXMC+EX_R11)(r13)
+	ld	r13, (PACA_EXMC+EX_R13)(r13)
+
+	RFI
+kvmppc_handler_trampoline_enter_end:
+
+
+
+/******************************************************************************
+ *                                                                            *
+ *                               Exit code                                    *
+ *                                                                            *
+ *****************************************************************************/
+
+.global kvmppc_handler_trampoline_exit
+kvmppc_handler_trampoline_exit:
+
+	/* Register usage at this point:
+	 *
+	 * SPRG_SCRATCH0 = guest R13
+	 * R01           = host R1
+	 * R02           = host R2
+	 * R10           = guest PC
+	 * R11           = guest MSR
+	 * R12           = exit handler id
+	 * R13           = PACA
+	 * PACA.exmc.CCR  = guest CR
+	 * PACA.exmc.R9  = guest R1
+	 * PACA.exmc.R10 = guest R10
+	 * PACA.exmc.R11 = guest R11
+	 * PACA.exmc.R12 = guest R12
+	 * PACA.exmc.R13 = guest R2
+	 *
+	 */
+
+	/* Save registers */
+
+	std	r0, (PACA_EXMC+EX_SRR0)(r13)
+	std	r9, (PACA_EXMC+EX_R3)(r13)
+	std	r10, (PACA_EXMC+EX_LR)(r13)
+	std	r11, (PACA_EXMC+EX_DAR)(r13)
+
+	/*
+	 * In order for us to easily get the last instruction,
+	 * we got the #vmexit at, we exploit the fact that the
+	 * virtual layout is still the same here, so we can just
+	 * ld from the guest's PC address
+	 */
+
+	/* We only load the last instruction when it's safe */
+	cmpwi	r12, BOOK3S_INTERRUPT_DATA_STORAGE
+	beq	ld_last_inst
+	cmpwi	r12, BOOK3S_INTERRUPT_PROGRAM
+	beq	ld_last_inst
+
+	b	no_ld_last_inst
+
+ld_last_inst:
+	/* Save off the guest instruction we're at */
+	/*    1) enable paging for data */
+	mfmsr	r9
+	ori	r11, r9, MSR_DR			/* Enable paging for data */
+	mtmsr	r11
+	/*    2) fetch the instruction */
+	lwz	r0, 0(r10)
+	/*    3) disable paging again */
+	mtmsr	r9
+
+no_ld_last_inst:
+
+	/* Restore bolted entries from the shadow and fix it along the way */
+
+	/* We don't store anything in entry 0, so we don't need to take care of that */
+	slbia
+	isync
+
+#if SLB_NUM_BOLTED == 3
+
+	ld	r11, PACA_SLBSHADOWPTR(r13)
+
+	ld	r10, 0x10(r11)
+	cmpdi	r10, 0
+	beq	slb_exit_skip_1
+	oris	r10, r10, SLB_ESID_V@h
+	ld	r9, 0x18(r11)
+	slbmte	r9, r10
+	std	r10, 0x10(r11)
+slb_exit_skip_1:
+	
+	ld	r10, 0x20(r11)
+	cmpdi	r10, 0
+	beq	slb_exit_skip_2
+	oris	r10, r10, SLB_ESID_V@h
+	ld	r9, 0x28(r11)
+	slbmte	r9, r10
+	std	r10, 0x20(r11)
+slb_exit_skip_2:
+	
+	ld	r10, 0x30(r11)
+	cmpdi	r10, 0
+	beq	slb_exit_skip_3
+	oris	r10, r10, SLB_ESID_V@h
+	ld	r9, 0x38(r11)
+	slbmte	r9, r10
+	std	r10, 0x30(r11)
+slb_exit_skip_3:
+	
+#else
+#error unknown number of bolted entries
+#endif
+
+slb_do_exit:
+
+	/* Restore registers */
+
+	ld	r11, (PACA_EXMC+EX_DAR)(r13)
+	ld	r10, (PACA_EXMC+EX_LR)(r13)
+	ld	r9, (PACA_EXMC+EX_R3)(r13)
+
+	/* Save last inst */
+	stw	r0, (PACA_EXMC+EX_LR)(r13)
+
+	/* Save DAR and DSISR before going to paged mode */
+	mfdar	r0
+	std	r0, (PACA_EXMC+EX_DAR)(r13)
+	mfdsisr	r0
+	stw	r0, (PACA_EXMC+EX_DSISR)(r13)
+
+	/* RFI into the highmem handler */
+	mfmsr	r0
+	ori	r0, r0, MSR_IR|MSR_DR|MSR_RI	/* Enable paging */
+	mtsrr1	r0
+	ld	r0, PACASAVEDMSR(r13)		/* Highmem handler address */
+	mtsrr0	r0
+
+	mfspr	r0, SPRN_SPRG_SCRATCH0
+
+	RFI
+kvmppc_handler_trampoline_exit_end:
+
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 08/27] Add SLB switching code for entry/exit
@ 2009-10-21 15:03                     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

This is the really low level of guest entry/exit code.

Book3s_64 has an SLB, which stores all ESID -> VSID mappings we're
currently aware of.

The segments in the guest differ from the ones on the host, so we need
to switch the SLB to tell the MMU that we're in a new context.

So we store a shadow of the guest's SLB in the PACA, switch to that on
entry and only restore bolted entries on exit, leaving the rest to the
Linux SLB fault handler.

That way we get a really clean way of switching the SLB.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s_64_slb.S |  277 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 277 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_slb.S

diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/book3s_64_slb.S
new file mode 100644
index 0000000..00a8367
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_slb.S
@@ -0,0 +1,277 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+/******************************************************************************
+ *                                                                            *
+ *                               Entry code                                   *
+ *                                                                            *
+ *****************************************************************************/
+
+.global kvmppc_handler_trampoline_enter
+kvmppc_handler_trampoline_enter:
+
+	/* Required state:
+	 *
+	 * MSR = ~IR|DR
+	 * R13 = PACA
+	 * R9 = guest IP
+	 * R10 = guest MSR
+	 * R11 = free
+	 * R12 = free
+	 * PACA[PACA_EXMC + EX_R9] = guest R9
+	 * PACA[PACA_EXMC + EX_R10] = guest R10
+	 * PACA[PACA_EXMC + EX_R11] = guest R11
+	 * PACA[PACA_EXMC + EX_R12] = guest R12
+	 * PACA[PACA_EXMC + EX_R13] = guest R13
+	 * PACA[PACA_EXMC + EX_CCR] = guest CR
+	 * PACA[PACA_EXMC + EX_R3] = guest XER
+	 */
+
+	mtsrr0	r9
+	mtsrr1	r10
+
+	mtspr	SPRN_SPRG_SCRATCH0, r0
+
+	/* Remove LPAR shadow entries */
+
+#if SLB_NUM_BOLTED = 3
+
+	ld	r12, PACA_SLBSHADOWPTR(r13)
+	ld	r10, 0x10(r12)
+	ld	r11, 0x18(r12)
+	/* Invalid? Skip. */
+	rldicl. r0, r10, 37, 63
+	beq	slb_entry_skip_1
+	xoris	r9, r10, SLB_ESID_V@h
+	std	r9, 0x10(r12)
+slb_entry_skip_1:
+	ld	r9, 0x20(r12)
+	/* Invalid? Skip. */
+	rldicl. r0, r9, 37, 63
+	beq	slb_entry_skip_2
+	xoris	r9, r9, SLB_ESID_V@h
+	std	r9, 0x20(r12)
+slb_entry_skip_2:
+	ld	r9, 0x30(r12)
+	/* Invalid? Skip. */
+	rldicl. r0, r9, 37, 63
+	beq	slb_entry_skip_3
+	xoris	r9, r9, SLB_ESID_V@h
+	std	r9, 0x30(r12)
+slb_entry_skip_3:
+	
+#else
+#error unknown number of bolted entries
+#endif
+
+	/* Flush SLB */
+
+	slbia
+
+	/* r0 = esid & ESID_MASK */
+	rldicr  r10, r10, 0, 35
+	/* r0 |= CLASS_BIT(VSID) */
+	rldic   r12, r11, 56 - 36, 36
+	or      r10, r10, r12
+	slbie	r10
+
+	isync
+
+	/* Fill SLB with our shadow */
+
+	lbz	r12, PACA_KVM_SLB_MAX(r13)
+	mulli	r12, r12, 16
+	addi	r12, r12, PACA_KVM_SLB
+	add	r12, r12, r13
+
+	/* for (r11 = kvm_slb; r11 < kvm_slb + kvm_slb_size; r11+=slb_entry) */
+	li	r11, PACA_KVM_SLB
+	add	r11, r11, r13
+
+slb_loop_enter:
+
+	ld	r10, 0(r11)
+
+	rldicl. r0, r10, 37, 63
+	beq	slb_loop_enter_skip
+
+	ld	r9, 8(r11)
+	slbmte	r9, r10
+
+slb_loop_enter_skip:
+	addi	r11, r11, 16
+	cmpd	cr0, r11, r12
+	blt	slb_loop_enter
+
+slb_do_enter:
+
+	/* Enter guest */
+
+	mfspr	r0, SPRN_SPRG_SCRATCH0
+
+	ld	r9, (PACA_EXMC+EX_R9)(r13)
+	ld	r10, (PACA_EXMC+EX_R10)(r13)
+	ld	r12, (PACA_EXMC+EX_R12)(r13)
+
+	lwz	r11, (PACA_EXMC+EX_CCR)(r13)
+	mtcr	r11
+
+	ld	r11, (PACA_EXMC+EX_R3)(r13)
+	mtxer	r11
+
+	ld	r11, (PACA_EXMC+EX_R11)(r13)
+	ld	r13, (PACA_EXMC+EX_R13)(r13)
+
+	RFI
+kvmppc_handler_trampoline_enter_end:
+
+
+
+/******************************************************************************
+ *                                                                            *
+ *                               Exit code                                    *
+ *                                                                            *
+ *****************************************************************************/
+
+.global kvmppc_handler_trampoline_exit
+kvmppc_handler_trampoline_exit:
+
+	/* Register usage at this point:
+	 *
+	 * SPRG_SCRATCH0 = guest R13
+	 * R01           = host R1
+	 * R02           = host R2
+	 * R10           = guest PC
+	 * R11           = guest MSR
+	 * R12           = exit handler id
+	 * R13           = PACA
+	 * PACA.exmc.CCR  = guest CR
+	 * PACA.exmc.R9  = guest R1
+	 * PACA.exmc.R10 = guest R10
+	 * PACA.exmc.R11 = guest R11
+	 * PACA.exmc.R12 = guest R12
+	 * PACA.exmc.R13 = guest R2
+	 *
+	 */
+
+	/* Save registers */
+
+	std	r0, (PACA_EXMC+EX_SRR0)(r13)
+	std	r9, (PACA_EXMC+EX_R3)(r13)
+	std	r10, (PACA_EXMC+EX_LR)(r13)
+	std	r11, (PACA_EXMC+EX_DAR)(r13)
+
+	/*
+	 * In order for us to easily get the last instruction,
+	 * we got the #vmexit at, we exploit the fact that the
+	 * virtual layout is still the same here, so we can just
+	 * ld from the guest's PC address
+	 */
+
+	/* We only load the last instruction when it's safe */
+	cmpwi	r12, BOOK3S_INTERRUPT_DATA_STORAGE
+	beq	ld_last_inst
+	cmpwi	r12, BOOK3S_INTERRUPT_PROGRAM
+	beq	ld_last_inst
+
+	b	no_ld_last_inst
+
+ld_last_inst:
+	/* Save off the guest instruction we're at */
+	/*    1) enable paging for data */
+	mfmsr	r9
+	ori	r11, r9, MSR_DR			/* Enable paging for data */
+	mtmsr	r11
+	/*    2) fetch the instruction */
+	lwz	r0, 0(r10)
+	/*    3) disable paging again */
+	mtmsr	r9
+
+no_ld_last_inst:
+
+	/* Restore bolted entries from the shadow and fix it along the way */
+
+	/* We don't store anything in entry 0, so we don't need to take care of that */
+	slbia
+	isync
+
+#if SLB_NUM_BOLTED = 3
+
+	ld	r11, PACA_SLBSHADOWPTR(r13)
+
+	ld	r10, 0x10(r11)
+	cmpdi	r10, 0
+	beq	slb_exit_skip_1
+	oris	r10, r10, SLB_ESID_V@h
+	ld	r9, 0x18(r11)
+	slbmte	r9, r10
+	std	r10, 0x10(r11)
+slb_exit_skip_1:
+	
+	ld	r10, 0x20(r11)
+	cmpdi	r10, 0
+	beq	slb_exit_skip_2
+	oris	r10, r10, SLB_ESID_V@h
+	ld	r9, 0x28(r11)
+	slbmte	r9, r10
+	std	r10, 0x20(r11)
+slb_exit_skip_2:
+	
+	ld	r10, 0x30(r11)
+	cmpdi	r10, 0
+	beq	slb_exit_skip_3
+	oris	r10, r10, SLB_ESID_V@h
+	ld	r9, 0x38(r11)
+	slbmte	r9, r10
+	std	r10, 0x30(r11)
+slb_exit_skip_3:
+	
+#else
+#error unknown number of bolted entries
+#endif
+
+slb_do_exit:
+
+	/* Restore registers */
+
+	ld	r11, (PACA_EXMC+EX_DAR)(r13)
+	ld	r10, (PACA_EXMC+EX_LR)(r13)
+	ld	r9, (PACA_EXMC+EX_R3)(r13)
+
+	/* Save last inst */
+	stw	r0, (PACA_EXMC+EX_LR)(r13)
+
+	/* Save DAR and DSISR before going to paged mode */
+	mfdar	r0
+	std	r0, (PACA_EXMC+EX_DAR)(r13)
+	mfdsisr	r0
+	stw	r0, (PACA_EXMC+EX_DSISR)(r13)
+
+	/* RFI into the highmem handler */
+	mfmsr	r0
+	ori	r0, r0, MSR_IR|MSR_DR|MSR_RI	/* Enable paging */
+	mtsrr1	r0
+	ld	r0, PACASAVEDMSR(r13)		/* Highmem handler address */
+	mtsrr0	r0
+
+	mfspr	r0, SPRN_SPRG_SCRATCH0
+
+	RFI
+kvmppc_handler_trampoline_exit_end:
+
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 09/27] Add interrupt handling code
       [not found]                     ` <1256137413-15256-9-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-21 15:03                         ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

Getting from host state to the guest is only half the story. We also need
to return to our host context and handle whatever happened to get us out of
the guest.

On PowerPC every guest exit is an interrupt. So all we need to do is trap
the host's interrupt handlers and get into our #VMEXIT code to handle it.

PowerPCs also have a register that can add an offset to the interrupt handlers'
adresses which is what the booke KVM code uses. Unfortunately that is a
hypervisor ressource and we also want to be able to run KVM when we're running
in an LPAR. So we have to hook into the Linux interrupt handlers.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>

---

v3 -> v4:

  - header rename fix
---
 arch/powerpc/kvm/book3s_64_rmhandlers.S |  131 +++++++++++++++++++++++++++++++
 1 files changed, 131 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_rmhandlers.S

diff --git a/arch/powerpc/kvm/book3s_64_rmhandlers.S b/arch/powerpc/kvm/book3s_64_rmhandlers.S
new file mode 100644
index 0000000..fb7dd2e
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_rmhandlers.S
@@ -0,0 +1,131 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
+ */
+
+#include <asm/ppc_asm.h>
+#include <asm/kvm_asm.h>
+#include <asm/reg.h>
+#include <asm/page.h>
+#include <asm/asm-offsets.h>
+#include <asm/exception-64s.h>
+
+/*****************************************************************************
+ *                                                                           *
+ *        Real Mode handlers that need to be in low physical memory          *
+ *                                                                           *
+ ****************************************************************************/
+
+
+.macro INTERRUPT_TRAMPOLINE intno
+
+.global kvmppc_trampoline_\intno
+kvmppc_trampoline_\intno:
+
+	mtspr	SPRN_SPRG_SCRATCH0, r13		/* Save r13 */
+
+	/*
+	 * First thing to do is to find out if we're coming
+	 * from a KVM guest or a Linux process.
+	 *
+	 * To distinguish, we check a magic byte in the PACA
+	 */
+	mfspr	r13, SPRN_SPRG_PACA		/* r13 = PACA */
+	std	r12, (PACA_EXMC + EX_R12)(r13)
+	mfcr	r12
+	stw	r12, (PACA_EXMC + EX_CCR)(r13)
+	lbz	r12, PACA_KVM_IN_GUEST(r13)
+	cmpwi	r12, 0
+	bne	..kvmppc_handler_hasmagic_\intno
+	/* No KVM guest? Then jump back to the Linux handler! */
+	lwz	r12, (PACA_EXMC + EX_CCR)(r13)
+	mtcr	r12
+	ld	r12, (PACA_EXMC + EX_R12)(r13)
+	mfspr	r13, SPRN_SPRG_SCRATCH0		/* r13 = original r13 */
+	b	kvmppc_resume_\intno		/* Get back original handler */
+
+	/* Now we know we're handling a KVM guest */
+..kvmppc_handler_hasmagic_\intno:
+	/* Unset guest state */
+	li	r12, 0
+	stb	r12, PACA_KVM_IN_GUEST(r13)
+
+	std	r1, (PACA_EXMC+EX_R9)(r13)
+	std	r10, (PACA_EXMC+EX_R10)(r13)
+	std	r11, (PACA_EXMC+EX_R11)(r13)
+	std	r2, (PACA_EXMC+EX_R13)(r13)
+
+	mfsrr0	r10
+	mfsrr1	r11
+
+	/* Restore R1/R2 so we can handle faults */
+	ld	r1, PACAR1(r13)
+	ld	r2, (PACA_EXMC+EX_SRR0)(r13)
+
+	/* Let's store which interrupt we're handling */
+	li	r12, \intno
+
+	/* Jump into the SLB exit code that goes to the highmem handler */
+	b	kvmppc_handler_trampoline_exit
+
+.endm
+
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_SYSTEM_RESET
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_MACHINE_CHECK
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_DATA_STORAGE
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_DATA_SEGMENT
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_INST_STORAGE
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_INST_SEGMENT
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_EXTERNAL
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_ALIGNMENT
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_PROGRAM
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_FP_UNAVAIL
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_DECREMENTER
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_SYSCALL
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_TRACE
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_PERFMON
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_ALTIVEC
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_VSX
+
+/*
+ * This trampoline brings us back to a real mode handler
+ *
+ * Input Registers:
+ *
+ * R6 = SRR0
+ * R7 = SRR1
+ * LR = real-mode IP
+ *
+ */
+.global kvmppc_handler_lowmem_trampoline
+kvmppc_handler_lowmem_trampoline:
+
+	mtsrr0	r6
+	mtsrr1	r7
+	blr
+kvmppc_handler_lowmem_trampoline_end:
+
+.global kvmppc_trampoline_lowmem
+kvmppc_trampoline_lowmem:
+	.long kvmppc_handler_lowmem_trampoline - _stext
+
+.global kvmppc_trampoline_enter
+kvmppc_trampoline_enter:
+	.long kvmppc_handler_trampoline_enter - _stext
+
+#include "book3s_64_slb.S"
+
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 09/27] Add interrupt handling code
@ 2009-10-21 15:03                         ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

Getting from host state to the guest is only half the story. We also need
to return to our host context and handle whatever happened to get us out of
the guest.

On PowerPC every guest exit is an interrupt. So all we need to do is trap
the host's interrupt handlers and get into our #VMEXIT code to handle it.

PowerPCs also have a register that can add an offset to the interrupt handlers'
adresses which is what the booke KVM code uses. Unfortunately that is a
hypervisor ressource and we also want to be able to run KVM when we're running
in an LPAR. So we have to hook into the Linux interrupt handlers.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - header rename fix
---
 arch/powerpc/kvm/book3s_64_rmhandlers.S |  131 +++++++++++++++++++++++++++++++
 1 files changed, 131 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_rmhandlers.S

diff --git a/arch/powerpc/kvm/book3s_64_rmhandlers.S b/arch/powerpc/kvm/book3s_64_rmhandlers.S
new file mode 100644
index 0000000..fb7dd2e
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_rmhandlers.S
@@ -0,0 +1,131 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <asm/ppc_asm.h>
+#include <asm/kvm_asm.h>
+#include <asm/reg.h>
+#include <asm/page.h>
+#include <asm/asm-offsets.h>
+#include <asm/exception-64s.h>
+
+/*****************************************************************************
+ *                                                                           *
+ *        Real Mode handlers that need to be in low physical memory          *
+ *                                                                           *
+ ****************************************************************************/
+
+
+.macro INTERRUPT_TRAMPOLINE intno
+
+.global kvmppc_trampoline_\intno
+kvmppc_trampoline_\intno:
+
+	mtspr	SPRN_SPRG_SCRATCH0, r13		/* Save r13 */
+
+	/*
+	 * First thing to do is to find out if we're coming
+	 * from a KVM guest or a Linux process.
+	 *
+	 * To distinguish, we check a magic byte in the PACA
+	 */
+	mfspr	r13, SPRN_SPRG_PACA		/* r13 = PACA */
+	std	r12, (PACA_EXMC + EX_R12)(r13)
+	mfcr	r12
+	stw	r12, (PACA_EXMC + EX_CCR)(r13)
+	lbz	r12, PACA_KVM_IN_GUEST(r13)
+	cmpwi	r12, 0
+	bne	..kvmppc_handler_hasmagic_\intno
+	/* No KVM guest? Then jump back to the Linux handler! */
+	lwz	r12, (PACA_EXMC + EX_CCR)(r13)
+	mtcr	r12
+	ld	r12, (PACA_EXMC + EX_R12)(r13)
+	mfspr	r13, SPRN_SPRG_SCRATCH0		/* r13 = original r13 */
+	b	kvmppc_resume_\intno		/* Get back original handler */
+
+	/* Now we know we're handling a KVM guest */
+..kvmppc_handler_hasmagic_\intno:
+	/* Unset guest state */
+	li	r12, 0
+	stb	r12, PACA_KVM_IN_GUEST(r13)
+
+	std	r1, (PACA_EXMC+EX_R9)(r13)
+	std	r10, (PACA_EXMC+EX_R10)(r13)
+	std	r11, (PACA_EXMC+EX_R11)(r13)
+	std	r2, (PACA_EXMC+EX_R13)(r13)
+
+	mfsrr0	r10
+	mfsrr1	r11
+
+	/* Restore R1/R2 so we can handle faults */
+	ld	r1, PACAR1(r13)
+	ld	r2, (PACA_EXMC+EX_SRR0)(r13)
+
+	/* Let's store which interrupt we're handling */
+	li	r12, \intno
+
+	/* Jump into the SLB exit code that goes to the highmem handler */
+	b	kvmppc_handler_trampoline_exit
+
+.endm
+
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_SYSTEM_RESET
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_MACHINE_CHECK
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_DATA_STORAGE
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_DATA_SEGMENT
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_INST_STORAGE
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_INST_SEGMENT
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_EXTERNAL
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_ALIGNMENT
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_PROGRAM
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_FP_UNAVAIL
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_DECREMENTER
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_SYSCALL
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_TRACE
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_PERFMON
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_ALTIVEC
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_VSX
+
+/*
+ * This trampoline brings us back to a real mode handler
+ *
+ * Input Registers:
+ *
+ * R6 = SRR0
+ * R7 = SRR1
+ * LR = real-mode IP
+ *
+ */
+.global kvmppc_handler_lowmem_trampoline
+kvmppc_handler_lowmem_trampoline:
+
+	mtsrr0	r6
+	mtsrr1	r7
+	blr
+kvmppc_handler_lowmem_trampoline_end:
+
+.global kvmppc_trampoline_lowmem
+kvmppc_trampoline_lowmem:
+	.long kvmppc_handler_lowmem_trampoline - _stext
+
+.global kvmppc_trampoline_enter
+kvmppc_trampoline_enter:
+	.long kvmppc_handler_trampoline_enter - _stext
+
+#include "book3s_64_slb.S"
+
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 10/27] Add book3s.c
       [not found]                         ` <1256137413-15256-10-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-21 15:03                             ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

This adds the book3s core handling file. Here everything that is generic to
desktop PowerPC cores is handled, including interrupt injections, MSR settings,
etc.

It basically takes over the same role as booke.c for embedded PowerPCs.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>

---

v3 -> v4:

  - use context_id instead of mm_alloc

v4 -> v5:

  - make pvr 32 bits
---
 arch/powerpc/kvm/book3s.c |  919 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 919 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s.c

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
new file mode 100644
index 0000000..0f4305b
--- /dev/null
+++ b/arch/powerpc/kvm/book3s.c
@@ -0,0 +1,919 @@
+/*
+ * Copyright (C) 2009. SUSE Linux Products GmbH. All rights reserved.
+ *
+ * Authors:
+ *    Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
+ *    Kevin Wolf <mail-vbj5DHeKsUHgbcAU4aOf7A@public.gmane.org>
+ *
+ * Description:
+ * This file is derived from arch/powerpc/kvm/44x.c,
+ * by Hollis Blanchard <hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kvm_host.h>
+#include <linux/err.h>
+
+#include <asm/reg.h>
+#include <asm/cputable.h>
+#include <asm/cacheflush.h>
+#include <asm/tlbflush.h>
+#include <asm/uaccess.h>
+#include <asm/io.h>
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+#include <asm/mmu_context.h>
+#include <linux/sched.h>
+#include <linux/vmalloc.h>
+
+#define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
+
+// #define EXIT_DEBUG
+// #define EXIT_DEBUG_SIMPLE
+
+// #define AGGRESSIVE_DEC
+
+struct kvm_stats_debugfs_item debugfs_entries[] = {
+	{ "exits",       VCPU_STAT(sum_exits) },
+	{ "mmio",        VCPU_STAT(mmio_exits) },
+	{ "sig",         VCPU_STAT(signal_exits) },
+	{ "sysc",        VCPU_STAT(syscall_exits) },
+	{ "inst_emu",    VCPU_STAT(emulated_inst_exits) },
+	{ "dec",         VCPU_STAT(dec_exits) },
+	{ "ext_intr",    VCPU_STAT(ext_intr_exits) },
+	{ "queue_intr",  VCPU_STAT(queue_intr) },
+	{ "halt_wakeup", VCPU_STAT(halt_wakeup) },
+	{ "pf_storage",  VCPU_STAT(pf_storage) },
+	{ "sp_storage",  VCPU_STAT(sp_storage) },
+	{ "pf_instruc",  VCPU_STAT(pf_instruc) },
+	{ "sp_instruc",  VCPU_STAT(sp_instruc) },
+	{ "ld",          VCPU_STAT(ld) },
+	{ "ld_slow",     VCPU_STAT(ld_slow) },
+	{ "st",          VCPU_STAT(st) },
+	{ "st_slow",     VCPU_STAT(st_slow) },
+	{ NULL }
+};
+
+void kvmppc_core_load_host_debugstate(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvmppc_core_load_guest_debugstate(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+	memcpy(get_paca()->kvm_slb, to_book3s(vcpu)->slb_shadow, sizeof(get_paca()->kvm_slb));
+	get_paca()->kvm_slb_max = to_book3s(vcpu)->slb_shadow_max;
+}
+
+void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	memcpy(to_book3s(vcpu)->slb_shadow, get_paca()->kvm_slb, sizeof(get_paca()->kvm_slb));
+	to_book3s(vcpu)->slb_shadow_max = get_paca()->kvm_slb_max;
+}
+
+#if defined(AGGRESSIVE_DEC) || defined(EXIT_DEBUG)
+static u32 kvmppc_get_dec(struct kvm_vcpu *vcpu)
+{
+	u64 jd = mftb() - vcpu->arch.dec_jiffies;
+	return vcpu->arch.dec - jd;
+}
+#endif
+
+void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
+{
+	ulong old_msr = vcpu->arch.msr;
+
+#ifdef EXIT_DEBUG
+	printk(KERN_INFO "KVM: Set MSR to 0x%llx\n", msr);
+#endif
+	msr &= to_book3s(vcpu)->msr_mask;
+	vcpu->arch.msr = msr;
+	vcpu->arch.shadow_msr = msr | MSR_USER32;
+	vcpu->arch.shadow_msr &= ( MSR_VEC | MSR_VSX | MSR_FP | MSR_FE0 |
+				   MSR_USER64 | MSR_SE | MSR_BE | MSR_DE |
+				   MSR_FE1);
+
+	if (msr & (MSR_WE|MSR_POW)) {
+		if (!vcpu->arch.pending_exceptions) {
+			kvm_vcpu_block(vcpu);
+			vcpu->stat.halt_wakeup++;
+		}
+	}
+
+	if (((vcpu->arch.msr & (MSR_IR|MSR_DR)) != (old_msr & (MSR_IR|MSR_DR))) ||
+	    (vcpu->arch.msr & MSR_PR) != (old_msr & MSR_PR)) {
+		kvmppc_mmu_flush_segments(vcpu);
+		kvmppc_mmu_map_segment(vcpu, vcpu->arch.pc);
+	}
+}
+
+void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags)
+{
+	vcpu->arch.srr0 = vcpu->arch.pc;
+	vcpu->arch.srr1 = vcpu->arch.msr | flags;
+	vcpu->arch.pc = to_book3s(vcpu)->hior + vec;
+	vcpu->arch.mmu.reset_msr(vcpu);
+}
+
+void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec)
+{
+	unsigned int prio;
+
+	vcpu->stat.queue_intr++;
+	switch (vec) {
+	case 0x100: prio = BOOK3S_IRQPRIO_SYSTEM_RESET;		break;
+	case 0x200: prio = BOOK3S_IRQPRIO_MACHINE_CHECK;	break;
+	case 0x300: prio = BOOK3S_IRQPRIO_DATA_STORAGE;		break;
+	case 0x380: prio = BOOK3S_IRQPRIO_DATA_SEGMENT;		break;
+	case 0x400: prio = BOOK3S_IRQPRIO_INST_STORAGE;		break;
+	case 0x480: prio = BOOK3S_IRQPRIO_INST_SEGMENT;		break;
+	case 0x500: prio = BOOK3S_IRQPRIO_EXTERNAL;		break;
+	case 0x600: prio = BOOK3S_IRQPRIO_ALIGNMENT;		break;
+	case 0x700: prio = BOOK3S_IRQPRIO_PROGRAM;		break;
+	case 0x800: prio = BOOK3S_IRQPRIO_FP_UNAVAIL;		break;
+	case 0x900: prio = BOOK3S_IRQPRIO_DECREMENTER;		break;
+	case 0xc00: prio = BOOK3S_IRQPRIO_SYSCALL;		break;
+	case 0xd00: prio = BOOK3S_IRQPRIO_DEBUG;		break;
+	case 0xf20: prio = BOOK3S_IRQPRIO_ALTIVEC;		break;
+	case 0xf40: prio = BOOK3S_IRQPRIO_VSX;			break;
+	default:    prio = BOOK3S_IRQPRIO_MAX;			break;
+	}
+
+	set_bit(prio, &vcpu->arch.pending_exceptions);
+#ifdef EXIT_DEBUG
+	printk(KERN_INFO "Queueing interrupt %x\n", vec);
+#endif
+}
+
+
+void kvmppc_core_queue_program(struct kvm_vcpu *vcpu)
+{
+	kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_PROGRAM);
+}
+
+void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu)
+{
+	kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_DECREMENTER);
+}
+
+int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu)
+{
+	return test_bit(BOOK3S_INTERRUPT_DECREMENTER >> 7, &vcpu->arch.pending_exceptions);
+}
+
+void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
+                                struct kvm_interrupt *irq)
+{
+	kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_EXTERNAL);
+}
+
+int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority)
+{
+	int deliver = 1;
+	int vec = 0;
+
+	switch (priority) {
+	case BOOK3S_IRQPRIO_DECREMENTER:
+		deliver = vcpu->arch.msr & MSR_EE;
+		vec = BOOK3S_INTERRUPT_DECREMENTER;
+		break;
+	case BOOK3S_IRQPRIO_EXTERNAL:
+		deliver = vcpu->arch.msr & MSR_EE;
+		vec = BOOK3S_INTERRUPT_EXTERNAL;
+		break;
+	case BOOK3S_IRQPRIO_SYSTEM_RESET:
+		vec = BOOK3S_INTERRUPT_SYSTEM_RESET;
+		break;
+	case BOOK3S_IRQPRIO_MACHINE_CHECK:
+		vec = BOOK3S_INTERRUPT_MACHINE_CHECK;
+		break;
+	case BOOK3S_IRQPRIO_DATA_STORAGE:
+		vec = BOOK3S_INTERRUPT_DATA_STORAGE;
+		break;
+	case BOOK3S_IRQPRIO_INST_STORAGE:
+		vec = BOOK3S_INTERRUPT_INST_STORAGE;
+		break;
+	case BOOK3S_IRQPRIO_DATA_SEGMENT:
+		vec = BOOK3S_INTERRUPT_DATA_SEGMENT;
+		break;
+	case BOOK3S_IRQPRIO_INST_SEGMENT:
+		vec = BOOK3S_INTERRUPT_INST_SEGMENT;
+		break;
+	case BOOK3S_IRQPRIO_ALIGNMENT:
+		vec = BOOK3S_INTERRUPT_ALIGNMENT;
+		break;
+	case BOOK3S_IRQPRIO_PROGRAM:
+		vec = BOOK3S_INTERRUPT_PROGRAM;
+		break;
+	case BOOK3S_IRQPRIO_VSX:
+		vec = BOOK3S_INTERRUPT_VSX;
+		break;
+	case BOOK3S_IRQPRIO_ALTIVEC:
+		vec = BOOK3S_INTERRUPT_ALTIVEC;
+		break;
+	case BOOK3S_IRQPRIO_FP_UNAVAIL:
+		vec = BOOK3S_INTERRUPT_FP_UNAVAIL;
+		break;
+	case BOOK3S_IRQPRIO_SYSCALL:
+		vec = BOOK3S_INTERRUPT_SYSCALL;
+		break;
+	case BOOK3S_IRQPRIO_DEBUG:
+		vec = BOOK3S_INTERRUPT_TRACE;
+		break;
+	case BOOK3S_IRQPRIO_PERFORMANCE_MONITOR:
+		vec = BOOK3S_INTERRUPT_PERFMON;
+		break;
+	default:
+		deliver = 0;
+		printk(KERN_ERR "KVM: Unknown interrupt: 0x%x\n", priority);
+		break;
+	}
+
+#if 0
+	printk(KERN_INFO "Deliver interrupt 0x%x? %x\n", vec, deliver);
+#endif
+
+	if (deliver)
+		kvmppc_inject_interrupt(vcpu, vec, 0ULL);
+
+	return deliver;
+}
+
+void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
+{
+	unsigned long *pending = &vcpu->arch.pending_exceptions;
+	unsigned int priority;
+
+	/* XXX be more clever here - no need to mftb() on every entry */
+	/* Issue DEC again if it's still active */
+#ifdef AGGRESSIVE_DEC
+	if (vcpu->arch.msr & MSR_EE)
+		if (kvmppc_get_dec(vcpu) & 0x80000000)
+			kvmppc_core_queue_dec(vcpu);
+#endif
+
+#ifdef EXIT_DEBUG
+	if (vcpu->arch.pending_exceptions)
+		printk(KERN_EMERG "KVM: Check pending: %lx\n", vcpu->arch.pending_exceptions);
+#endif
+	priority = __ffs(*pending);
+	while (priority <= (sizeof(unsigned int) * 8)) {
+		if (kvmppc_book3s_irqprio_deliver(vcpu, priority)) {
+			clear_bit(priority, &vcpu->arch.pending_exceptions);
+			break;
+		}
+
+		priority = find_next_bit(pending,
+					 BITS_PER_BYTE * sizeof(*pending),
+					 priority + 1);
+	}
+}
+
+void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr)
+{
+	vcpu->arch.pvr = pvr;
+	if ((pvr >= 0x330000) && (pvr < 0x70330000)) {
+		kvmppc_mmu_book3s_64_init(vcpu);
+		to_book3s(vcpu)->hior = 0xfff00000;
+		to_book3s(vcpu)->msr_mask = 0xffffffffffffffffULL;
+	} else {
+		kvmppc_mmu_book3s_32_init(vcpu);
+		to_book3s(vcpu)->hior = 0;
+		to_book3s(vcpu)->msr_mask = 0xffffffffULL;
+	}
+
+	/* If we are in hypervisor level on 970, we can tell the CPU to
+	 * treat DCBZ as 32 bytes store */
+	vcpu->arch.hflags &= ~BOOK3S_HFLAG_DCBZ32;
+	if (vcpu->arch.mmu.is_dcbz32(vcpu) && (mfmsr() & MSR_HV) &&
+	    !strcmp(cur_cpu_spec->platform, "ppc970"))
+		vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
+
+}
+
+/* Book3s_32 CPUs always have 32 bytes cache line size, which Linux assumes. To
+ * make Book3s_32 Linux work on Book3s_64, we have to make sure we trap dcbz to
+ * emulate 32 bytes dcbz length.
+ *
+ * The Book3s_64 inventors also realized this case and implemented a special bit
+ * in the HID5 register, which is a hypervisor ressource. Thus we can't use it.
+ *
+ * My approach here is to patch the dcbz instruction on executing pages.
+ */
+static void kvmppc_patch_dcbz(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte)
+{
+	bool touched = false;
+	hva_t hpage;
+	u32 *page;
+	int i;
+
+	hpage = gfn_to_hva(vcpu->kvm, pte->raddr >> PAGE_SHIFT);
+	if (kvm_is_error_hva(hpage))
+		return;
+
+	hpage |= pte->raddr & ~PAGE_MASK;
+	hpage &= ~0xFFFULL;
+
+	page = vmalloc(HW_PAGE_SIZE);
+
+	if (copy_from_user(page, (void __user *)hpage, HW_PAGE_SIZE))
+		goto out;
+
+	for (i=0; i < HW_PAGE_SIZE / 4; i++)
+		if ((page[i] & 0xff0007ff) == INS_DCBZ) {
+			page[i] &= 0xfffffff7; // reserved instruction, so we trap
+			touched = true;
+		}
+
+	if (touched)
+		copy_to_user((void __user *)hpage, page, HW_PAGE_SIZE);
+
+out:
+	vfree(page);
+}
+
+static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, bool data,
+			 struct kvmppc_pte *pte)
+{
+	int relocated = (vcpu->arch.msr & (data ? MSR_DR : MSR_IR));
+	int r;
+
+	if (relocated) {
+		r = vcpu->arch.mmu.xlate(vcpu, eaddr, pte, data);
+	} else {
+		pte->eaddr = eaddr;
+		pte->raddr = eaddr & 0xffffffff;
+		pte->vpage = eaddr >> 12;
+		switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+		case 0:
+			pte->vpage |= VSID_REAL;
+		case MSR_DR:
+			pte->vpage |= VSID_REAL_DR;
+		case MSR_IR:
+			pte->vpage |= VSID_REAL_IR;
+		}
+		pte->may_read = true;
+		pte->may_write = true;
+		pte->may_execute = true;
+		r = 0;
+	}
+
+	return r;
+}
+
+static hva_t kvmppc_bad_hva(void)
+{
+	return PAGE_OFFSET;
+}
+
+static hva_t kvmppc_pte_to_hva(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte,
+			       bool read)
+{
+	hva_t hpage;
+
+	if (read && !pte->may_read)
+		goto err;
+
+	if (!read && !pte->may_write)
+		goto err;
+
+	hpage = gfn_to_hva(vcpu->kvm, pte->raddr >> PAGE_SHIFT);
+	if (kvm_is_error_hva(hpage))
+		goto err;
+
+	return hpage | (pte->raddr & ~PAGE_MASK);
+err:
+	return kvmppc_bad_hva();
+}
+
+int kvmppc_st(struct kvm_vcpu *vcpu, ulong eaddr, int size, void *ptr)
+{
+	struct kvmppc_pte pte;
+	hva_t hva = eaddr;
+
+	vcpu->stat.st++;
+
+	if (kvmppc_xlate(vcpu, eaddr, false, &pte))
+		goto err;
+
+	hva = kvmppc_pte_to_hva(vcpu, &pte, false);
+	if (kvm_is_error_hva(hva))
+		goto err;
+
+	if (copy_to_user((void __user *)hva, ptr, size)) {
+		printk(KERN_INFO "kvmppc_st at 0x%lx failed\n", hva);
+		goto err;
+	}
+
+	return 0;
+
+err:
+	return -ENOENT;
+}
+
+int kvmppc_ld(struct kvm_vcpu *vcpu, ulong eaddr, int size, void *ptr,
+		      bool data)
+{
+	struct kvmppc_pte pte;
+	hva_t hva = eaddr;
+
+	vcpu->stat.ld++;
+
+	if (kvmppc_xlate(vcpu, eaddr, data, &pte))
+		goto err;
+
+	hva = kvmppc_pte_to_hva(vcpu, &pte, true);
+	if (kvm_is_error_hva(hva))
+		goto err;
+
+	if (copy_from_user(ptr, (void __user *)hva, size)) {
+		printk(KERN_INFO "kvmppc_ld at 0x%lx failed\n", hva);
+		goto err;
+	}
+
+	return 0;
+
+err:
+	return -ENOENT;
+}
+
+static int kvmppc_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+	return kvm_is_visible_gfn(vcpu->kvm, gfn);
+}
+
+int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu,
+			    ulong eaddr, int vec)
+{
+	bool data = (vec == BOOK3S_INTERRUPT_DATA_STORAGE);
+	int r = RESUME_GUEST;
+	int relocated;
+	int page_found = 0;
+	struct kvmppc_pte pte;
+	bool is_mmio = false;
+
+	if ( vec == BOOK3S_INTERRUPT_DATA_STORAGE ) {
+		relocated = (vcpu->arch.msr & MSR_DR);
+	} else {
+		relocated = (vcpu->arch.msr & MSR_IR);
+	}
+
+	/* Resolve real address if translation turned on */
+	if (relocated) {
+		page_found = vcpu->arch.mmu.xlate(vcpu, eaddr, &pte, data);
+	} else {
+		pte.may_execute = true;
+		pte.may_read = true;
+		pte.may_write = true;
+		pte.raddr = eaddr & 0xffffffff;
+		pte.eaddr = eaddr;
+		pte.vpage = eaddr >> 12;
+		switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+		case 0:
+			pte.vpage |= VSID_REAL;
+		case MSR_DR:
+			pte.vpage |= VSID_REAL_DR;
+		case MSR_IR:
+			pte.vpage |= VSID_REAL_IR;
+		}
+	}
+
+	if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+	   (!(vcpu->arch.hflags & BOOK3S_HFLAG_DCBZ32))) {
+		/*
+		 * If we do the dcbz hack, we have to NX on every execution,
+		 * so we can patch the executing code. This renders our guest
+		 * NX-less.
+		 */
+		pte.may_execute = !data;
+	}
+
+	if (page_found == -ENOENT) {
+		/* Page not found in guest PTE entries */
+		vcpu->arch.dear = vcpu->arch.fault_dear;
+		to_book3s(vcpu)->dsisr = vcpu->arch.fault_dsisr;
+		vcpu->arch.msr |= (vcpu->arch.shadow_msr & 0x00000000f8000000ULL);
+		kvmppc_book3s_queue_irqprio(vcpu, vec);
+	} else if (page_found == -EPERM) {
+		/* Storage protection */
+		vcpu->arch.dear = vcpu->arch.fault_dear;
+		to_book3s(vcpu)->dsisr = vcpu->arch.fault_dsisr & ~DSISR_NOHPTE;
+		to_book3s(vcpu)->dsisr |= DSISR_PROTFAULT;
+		vcpu->arch.msr |= (vcpu->arch.shadow_msr & 0x00000000f8000000ULL);
+		kvmppc_book3s_queue_irqprio(vcpu, vec);
+	} else if (page_found == -EINVAL) {
+		/* Page not found in guest SLB */
+		vcpu->arch.dear = vcpu->arch.fault_dear;
+		kvmppc_book3s_queue_irqprio(vcpu, vec + 0x80);
+	} else if (!is_mmio &&
+		   kvmppc_visible_gfn(vcpu, pte.raddr >> PAGE_SHIFT)) {
+		/* The guest's PTE is not mapped yet. Map on the host */
+		kvmppc_mmu_map_page(vcpu, &pte);
+		if (data)
+			vcpu->stat.sp_storage++;
+		else if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+			(!(vcpu->arch.hflags & BOOK3S_HFLAG_DCBZ32)))
+			kvmppc_patch_dcbz(vcpu, &pte);
+	} else {
+		/* MMIO */
+		vcpu->stat.mmio_exits++;
+		vcpu->arch.paddr_accessed = pte.raddr;
+		r = kvmppc_emulate_mmio(run, vcpu);
+		if ( r == RESUME_HOST_NV )
+			r = RESUME_HOST;
+		if ( r == RESUME_GUEST_NV )
+			r = RESUME_GUEST;
+	}
+
+	return r;
+}
+
+int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
+                       unsigned int exit_nr)
+{
+	int r = RESUME_HOST;
+
+	vcpu->stat.sum_exits++;
+
+	run->exit_reason = KVM_EXIT_UNKNOWN;
+	run->ready_for_interrupt_injection = 1;
+#ifdef EXIT_DEBUG
+	printk(KERN_EMERG "exit_nr=0x%x | pc=0x%lx | dar=0x%lx | dec=0x%x | msr=0x%lx\n",
+		exit_nr, vcpu->arch.pc, vcpu->arch.fault_dear,
+		kvmppc_get_dec(vcpu), vcpu->arch.msr);
+#elif defined (EXIT_DEBUG_SIMPLE)
+	if ((exit_nr != 0x900) && (exit_nr != 0x500))
+		printk(KERN_EMERG "exit_nr=0x%x | pc=0x%lx | dar=0x%lx | msr=0x%lx\n",
+			exit_nr, vcpu->arch.pc, vcpu->arch.fault_dear,
+			vcpu->arch.msr);
+#endif
+	kvm_resched(vcpu);
+	switch (exit_nr) {
+	case BOOK3S_INTERRUPT_INST_STORAGE:
+		vcpu->stat.pf_instruc++;
+		/* only care about PTEG not found errors, but leave NX alone */
+		if (vcpu->arch.shadow_msr & 0x40000000) {
+			r = kvmppc_handle_pagefault(run, vcpu, vcpu->arch.pc, exit_nr);
+			vcpu->stat.sp_instruc++;
+		} else if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+			  (!(vcpu->arch.hflags & BOOK3S_HFLAG_DCBZ32))) {
+			/*
+			 * XXX If we do the dcbz hack we use the NX bit to flush&patch the page,
+			 *     so we can't use the NX bit inside the guest. Let's cross our fingers,
+			 *     that no guest that needs the dcbz hack does NX.
+			 */
+			kvmppc_mmu_pte_flush(vcpu, vcpu->arch.pc, ~0xFFFULL);
+		} else {
+			vcpu->arch.msr |= (vcpu->arch.shadow_msr & 0x58000000);
+			kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+			kvmppc_mmu_pte_flush(vcpu, vcpu->arch.pc, ~0xFFFULL);
+			r = RESUME_GUEST;
+		}
+		break;
+	case BOOK3S_INTERRUPT_DATA_STORAGE:
+		vcpu->stat.pf_storage++;
+		/* The only case we need to handle is missing shadow PTEs */
+		if (vcpu->arch.fault_dsisr & DSISR_NOHPTE) {
+			r = kvmppc_handle_pagefault(run, vcpu, vcpu->arch.fault_dear, exit_nr);
+		} else {
+			vcpu->arch.dear = vcpu->arch.fault_dear;
+			to_book3s(vcpu)->dsisr = vcpu->arch.fault_dsisr;
+			kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+			kvmppc_mmu_pte_flush(vcpu, vcpu->arch.dear, ~0xFFFULL);
+			r = RESUME_GUEST;
+		}
+		break;
+	case BOOK3S_INTERRUPT_DATA_SEGMENT:
+		if (kvmppc_mmu_map_segment(vcpu, vcpu->arch.fault_dear) < 0) {
+			vcpu->arch.dear = vcpu->arch.fault_dear;
+			kvmppc_book3s_queue_irqprio(vcpu,
+				BOOK3S_INTERRUPT_DATA_SEGMENT);
+		}
+		r = RESUME_GUEST;
+		break;
+	case BOOK3S_INTERRUPT_INST_SEGMENT:
+		if (kvmppc_mmu_map_segment(vcpu, vcpu->arch.pc) < 0) {
+			kvmppc_book3s_queue_irqprio(vcpu,
+				BOOK3S_INTERRUPT_INST_SEGMENT);
+		}
+		r = RESUME_GUEST;
+		break;
+	/* We're good on these - the host merely wanted to get our attention */
+	case BOOK3S_INTERRUPT_DECREMENTER:
+		vcpu->stat.dec_exits++;
+		r = RESUME_GUEST;
+		break;
+	case BOOK3S_INTERRUPT_EXTERNAL:
+		vcpu->stat.ext_intr_exits++;
+		r = RESUME_GUEST;
+		break;
+	case BOOK3S_INTERRUPT_PROGRAM:
+	{
+		enum emulation_result er;
+
+		if (vcpu->arch.msr & MSR_PR) {
+#ifdef EXIT_DEBUG
+			printk(KERN_INFO "Userspace triggered 0x700 exception at 0x%lx (0x%x)\n", vcpu->arch.pc, vcpu->arch.last_inst);
+#endif
+			if ((vcpu->arch.last_inst & 0xff0007ff) !=
+			    (INS_DCBZ & 0xfffffff7)) {
+				kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+				r = RESUME_GUEST;
+				break;
+			}
+		}
+
+		vcpu->stat.emulated_inst_exits++;
+		er = kvmppc_emulate_instruction(run, vcpu);
+		switch (er) {
+		case EMULATE_DONE:
+			r = RESUME_GUEST;
+			break;
+		case EMULATE_FAIL:
+			printk(KERN_CRIT "%s: emulation at %lx failed (%08x)\n",
+			       __func__, vcpu->arch.pc, vcpu->arch.last_inst);
+			kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+			r = RESUME_GUEST;
+			break;
+		default:
+			BUG();
+		}
+		break;
+	}
+	case BOOK3S_INTERRUPT_SYSCALL:
+#ifdef EXIT_DEBUG
+		printk(KERN_INFO "Syscall Nr %d\n", (int)vcpu->arch.gpr[0]);
+#endif
+		vcpu->stat.syscall_exits++;
+		kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+		r = RESUME_GUEST;
+		break;
+	case BOOK3S_INTERRUPT_MACHINE_CHECK:
+	case BOOK3S_INTERRUPT_FP_UNAVAIL:
+	case BOOK3S_INTERRUPT_TRACE:
+	case BOOK3S_INTERRUPT_ALTIVEC:
+	case BOOK3S_INTERRUPT_VSX:
+		kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+		r = RESUME_GUEST;
+		break;
+	default:
+		/* Ugh - bork here! What did we get? */
+		printk(KERN_EMERG "exit_nr=0x%x | pc=0x%lx | msr=0x%lx\n", exit_nr, vcpu->arch.pc, vcpu->arch.shadow_msr);
+		r = RESUME_HOST;
+		BUG();
+		break;
+	}
+
+
+	if (!(r & RESUME_HOST)) {
+		/* To avoid clobbering exit_reason, only check for signals if
+		 * we aren't already exiting to userspace for some other
+		 * reason. */
+		if (signal_pending(current)) {
+#ifdef EXIT_DEBUG
+			printk(KERN_EMERG "KVM: Going back to host\n");
+#endif
+			vcpu->stat.signal_exits++;
+			run->exit_reason = KVM_EXIT_INTR;
+			r = -EINTR;
+		} else {
+			/* In case an interrupt came in that was triggered
+			 * from userspace (like DEC), we need to check what
+			 * to inject now! */
+			kvmppc_core_deliver_interrupts(vcpu);
+		}
+	}
+
+#ifdef EXIT_DEBUG
+	printk(KERN_EMERG "KVM exit: vcpu=0x%p pc=0x%lx r=0x%x\n", vcpu, vcpu->arch.pc, r);
+#endif
+
+	return r;
+}
+
+int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	int i;
+
+	regs->pc = vcpu->arch.pc;
+	regs->cr = vcpu->arch.cr;
+	regs->ctr = vcpu->arch.ctr;
+	regs->lr = vcpu->arch.lr;
+	regs->xer = vcpu->arch.xer;
+	regs->msr = vcpu->arch.msr;
+	regs->srr0 = vcpu->arch.srr0;
+	regs->srr1 = vcpu->arch.srr1;
+	regs->pid = vcpu->arch.pid;
+	regs->sprg0 = vcpu->arch.sprg0;
+	regs->sprg1 = vcpu->arch.sprg1;
+	regs->sprg2 = vcpu->arch.sprg2;
+	regs->sprg3 = vcpu->arch.sprg3;
+	regs->sprg5 = vcpu->arch.sprg4;
+	regs->sprg6 = vcpu->arch.sprg5;
+	regs->sprg7 = vcpu->arch.sprg6;
+
+	for (i = 0; i < ARRAY_SIZE(regs->gpr); i++)
+		regs->gpr[i] = vcpu->arch.gpr[i];
+
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	int i;
+
+	vcpu->arch.pc = regs->pc;
+	vcpu->arch.cr = regs->cr;
+	vcpu->arch.ctr = regs->ctr;
+	vcpu->arch.lr = regs->lr;
+	vcpu->arch.xer = regs->xer;
+	kvmppc_set_msr(vcpu, regs->msr);
+	vcpu->arch.srr0 = regs->srr0;
+	vcpu->arch.srr1 = regs->srr1;
+	vcpu->arch.sprg0 = regs->sprg0;
+	vcpu->arch.sprg1 = regs->sprg1;
+	vcpu->arch.sprg2 = regs->sprg2;
+	vcpu->arch.sprg3 = regs->sprg3;
+	vcpu->arch.sprg5 = regs->sprg4;
+	vcpu->arch.sprg6 = regs->sprg5;
+	vcpu->arch.sprg7 = regs->sprg6;
+
+	for (i = 0; i < ARRAY_SIZE(vcpu->arch.gpr); i++)
+		vcpu->arch.gpr[i] = regs->gpr[i];
+
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+                                  struct kvm_sregs *sregs)
+{
+	sregs->pvr = vcpu->arch.pvr;
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+                                  struct kvm_sregs *sregs)
+{
+	kvmppc_set_pvr(vcpu, sregs->pvr);
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -ENOTSUPP;
+}
+
+int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -ENOTSUPP;
+}
+
+int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
+                                  struct kvm_translation *tr)
+{
+	return 0;
+}
+
+/*
+ * Get (and clear) the dirty memory log for a memory slot.
+ */
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
+				      struct kvm_dirty_log *log)
+{
+	int r;
+	int n;
+	struct kvm_memory_slot *memslot;
+	int is_dirty = 0;
+
+	down_write(&kvm->slots_lock);
+
+	r = kvm_get_dirty_log(kvm, log, &is_dirty);
+	if (r)
+		goto out;
+
+	/* If nothing is dirty, don't bother messing with page tables. */
+	if (is_dirty) {
+		memslot = &kvm->memslots[log->slot];
+		for (n = 0; n < atomic_read(&kvm->online_vcpus); n++) {
+			ulong ga = memslot->base_gfn << PAGE_SHIFT;
+			ulong ga_end = ga + (memslot->npages << PAGE_SHIFT);
+
+			kvmppc_mmu_pte_pflush(kvm->vcpus[n], ga, ga_end);
+		}
+		n = ALIGN(memslot->npages, BITS_PER_LONG) / 8;
+		memset(memslot->dirty_bitmap, 0, n);
+	}
+
+	r = 0;
+out:
+	up_write(&kvm->slots_lock);
+	return r;
+}
+
+int kvmppc_core_check_processor_compat(void)
+{
+	return 0;
+}
+
+struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s;
+	struct kvm_vcpu *vcpu;
+	int err;
+
+	vcpu_book3s = (struct kvmppc_vcpu_book3s *)__get_free_pages( GFP_KERNEL | __GFP_ZERO,
+			get_order(sizeof(struct kvmppc_vcpu_book3s)));
+	if (!vcpu_book3s) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	vcpu = &vcpu_book3s->vcpu;
+	err = kvm_vcpu_init(vcpu, kvm, id);
+	if (err)
+		goto free_vcpu;
+
+	vcpu->arch.host_retip = kvm_return_point;
+	vcpu->arch.host_msr = mfmsr();
+	/* default to book3s_64 (970fx) */
+	vcpu->arch.pvr = 0x3C0301;
+	kvmppc_set_pvr(vcpu, vcpu->arch.pvr);
+	vcpu_book3s->slb_nr = 64;
+
+	/* remember where some real-mode handlers are */
+	vcpu->arch.trampoline_lowmem = kvmppc_trampoline_lowmem;
+	vcpu->arch.trampoline_enter = kvmppc_trampoline_enter;
+	vcpu->arch.highmem_handler = (ulong)kvmppc_handler_highmem;
+
+	vcpu->arch.shadow_msr = MSR_USER64;
+
+	err = __init_new_context();
+	if (err < 0)
+		goto free_vcpu;
+	vcpu_book3s->context_id = err;
+
+	vcpu_book3s->vsid_max = ((vcpu_book3s->context_id + 1) << USER_ESID_BITS) - 1;
+	vcpu_book3s->vsid_first = vcpu_book3s->context_id << USER_ESID_BITS;
+	vcpu_book3s->vsid_next = vcpu_book3s->vsid_first;
+
+	return vcpu;
+
+free_vcpu:
+	free_pages((long)vcpu_book3s, get_order(sizeof(struct kvmppc_vcpu_book3s)));
+out:
+	return ERR_PTR(err);
+}
+
+void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+
+	__destroy_context(vcpu_book3s->context_id);
+	kvm_vcpu_uninit(vcpu);
+	free_pages((long)vcpu_book3s, get_order(sizeof(struct kvmppc_vcpu_book3s)));
+}
+
+extern int __kvmppc_vcpu_entry(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
+int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
+{
+	int ret;
+
+	/* No need to go into the guest when all we do is going out */
+	if (signal_pending(current)) {
+		kvm_run->exit_reason = KVM_EXIT_INTR;
+		return -EINTR;
+	}
+
+	/* XXX we get called with irq disabled - change that! */
+	local_irq_enable();
+
+	ret = __kvmppc_vcpu_entry(kvm_run, vcpu);
+
+	local_irq_disable();
+
+	return ret;
+}
+
+static int kvmppc_book3s_init(void)
+{
+	return kvm_init(NULL, sizeof(struct kvmppc_vcpu_book3s), THIS_MODULE);
+}
+
+static void kvmppc_book3s_exit(void)
+{
+	kvm_exit();
+}
+
+module_init(kvmppc_book3s_init);
+module_exit(kvmppc_book3s_exit);
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 10/27] Add book3s.c
@ 2009-10-21 15:03                             ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

This adds the book3s core handling file. Here everything that is generic to
desktop PowerPC cores is handled, including interrupt injections, MSR settings,
etc.

It basically takes over the same role as booke.c for embedded PowerPCs.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - use context_id instead of mm_alloc

v4 -> v5:

  - make pvr 32 bits
---
 arch/powerpc/kvm/book3s.c |  919 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 919 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s.c

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
new file mode 100644
index 0000000..0f4305b
--- /dev/null
+++ b/arch/powerpc/kvm/book3s.c
@@ -0,0 +1,919 @@
+/*
+ * Copyright (C) 2009. SUSE Linux Products GmbH. All rights reserved.
+ *
+ * Authors:
+ *    Alexander Graf <agraf@suse.de>
+ *    Kevin Wolf <mail@kevin-wolf.de>
+ *
+ * Description:
+ * This file is derived from arch/powerpc/kvm/44x.c,
+ * by Hollis Blanchard <hollisb@us.ibm.com>.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kvm_host.h>
+#include <linux/err.h>
+
+#include <asm/reg.h>
+#include <asm/cputable.h>
+#include <asm/cacheflush.h>
+#include <asm/tlbflush.h>
+#include <asm/uaccess.h>
+#include <asm/io.h>
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+#include <asm/mmu_context.h>
+#include <linux/sched.h>
+#include <linux/vmalloc.h>
+
+#define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
+
+// #define EXIT_DEBUG
+// #define EXIT_DEBUG_SIMPLE
+
+// #define AGGRESSIVE_DEC
+
+struct kvm_stats_debugfs_item debugfs_entries[] = {
+	{ "exits",       VCPU_STAT(sum_exits) },
+	{ "mmio",        VCPU_STAT(mmio_exits) },
+	{ "sig",         VCPU_STAT(signal_exits) },
+	{ "sysc",        VCPU_STAT(syscall_exits) },
+	{ "inst_emu",    VCPU_STAT(emulated_inst_exits) },
+	{ "dec",         VCPU_STAT(dec_exits) },
+	{ "ext_intr",    VCPU_STAT(ext_intr_exits) },
+	{ "queue_intr",  VCPU_STAT(queue_intr) },
+	{ "halt_wakeup", VCPU_STAT(halt_wakeup) },
+	{ "pf_storage",  VCPU_STAT(pf_storage) },
+	{ "sp_storage",  VCPU_STAT(sp_storage) },
+	{ "pf_instruc",  VCPU_STAT(pf_instruc) },
+	{ "sp_instruc",  VCPU_STAT(sp_instruc) },
+	{ "ld",          VCPU_STAT(ld) },
+	{ "ld_slow",     VCPU_STAT(ld_slow) },
+	{ "st",          VCPU_STAT(st) },
+	{ "st_slow",     VCPU_STAT(st_slow) },
+	{ NULL }
+};
+
+void kvmppc_core_load_host_debugstate(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvmppc_core_load_guest_debugstate(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+	memcpy(get_paca()->kvm_slb, to_book3s(vcpu)->slb_shadow, sizeof(get_paca()->kvm_slb));
+	get_paca()->kvm_slb_max = to_book3s(vcpu)->slb_shadow_max;
+}
+
+void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	memcpy(to_book3s(vcpu)->slb_shadow, get_paca()->kvm_slb, sizeof(get_paca()->kvm_slb));
+	to_book3s(vcpu)->slb_shadow_max = get_paca()->kvm_slb_max;
+}
+
+#if defined(AGGRESSIVE_DEC) || defined(EXIT_DEBUG)
+static u32 kvmppc_get_dec(struct kvm_vcpu *vcpu)
+{
+	u64 jd = mftb() - vcpu->arch.dec_jiffies;
+	return vcpu->arch.dec - jd;
+}
+#endif
+
+void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
+{
+	ulong old_msr = vcpu->arch.msr;
+
+#ifdef EXIT_DEBUG
+	printk(KERN_INFO "KVM: Set MSR to 0x%llx\n", msr);
+#endif
+	msr &= to_book3s(vcpu)->msr_mask;
+	vcpu->arch.msr = msr;
+	vcpu->arch.shadow_msr = msr | MSR_USER32;
+	vcpu->arch.shadow_msr &= ( MSR_VEC | MSR_VSX | MSR_FP | MSR_FE0 |
+				   MSR_USER64 | MSR_SE | MSR_BE | MSR_DE |
+				   MSR_FE1);
+
+	if (msr & (MSR_WE|MSR_POW)) {
+		if (!vcpu->arch.pending_exceptions) {
+			kvm_vcpu_block(vcpu);
+			vcpu->stat.halt_wakeup++;
+		}
+	}
+
+	if (((vcpu->arch.msr & (MSR_IR|MSR_DR)) != (old_msr & (MSR_IR|MSR_DR))) ||
+	    (vcpu->arch.msr & MSR_PR) != (old_msr & MSR_PR)) {
+		kvmppc_mmu_flush_segments(vcpu);
+		kvmppc_mmu_map_segment(vcpu, vcpu->arch.pc);
+	}
+}
+
+void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags)
+{
+	vcpu->arch.srr0 = vcpu->arch.pc;
+	vcpu->arch.srr1 = vcpu->arch.msr | flags;
+	vcpu->arch.pc = to_book3s(vcpu)->hior + vec;
+	vcpu->arch.mmu.reset_msr(vcpu);
+}
+
+void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec)
+{
+	unsigned int prio;
+
+	vcpu->stat.queue_intr++;
+	switch (vec) {
+	case 0x100: prio = BOOK3S_IRQPRIO_SYSTEM_RESET;		break;
+	case 0x200: prio = BOOK3S_IRQPRIO_MACHINE_CHECK;	break;
+	case 0x300: prio = BOOK3S_IRQPRIO_DATA_STORAGE;		break;
+	case 0x380: prio = BOOK3S_IRQPRIO_DATA_SEGMENT;		break;
+	case 0x400: prio = BOOK3S_IRQPRIO_INST_STORAGE;		break;
+	case 0x480: prio = BOOK3S_IRQPRIO_INST_SEGMENT;		break;
+	case 0x500: prio = BOOK3S_IRQPRIO_EXTERNAL;		break;
+	case 0x600: prio = BOOK3S_IRQPRIO_ALIGNMENT;		break;
+	case 0x700: prio = BOOK3S_IRQPRIO_PROGRAM;		break;
+	case 0x800: prio = BOOK3S_IRQPRIO_FP_UNAVAIL;		break;
+	case 0x900: prio = BOOK3S_IRQPRIO_DECREMENTER;		break;
+	case 0xc00: prio = BOOK3S_IRQPRIO_SYSCALL;		break;
+	case 0xd00: prio = BOOK3S_IRQPRIO_DEBUG;		break;
+	case 0xf20: prio = BOOK3S_IRQPRIO_ALTIVEC;		break;
+	case 0xf40: prio = BOOK3S_IRQPRIO_VSX;			break;
+	default:    prio = BOOK3S_IRQPRIO_MAX;			break;
+	}
+
+	set_bit(prio, &vcpu->arch.pending_exceptions);
+#ifdef EXIT_DEBUG
+	printk(KERN_INFO "Queueing interrupt %x\n", vec);
+#endif
+}
+
+
+void kvmppc_core_queue_program(struct kvm_vcpu *vcpu)
+{
+	kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_PROGRAM);
+}
+
+void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu)
+{
+	kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_DECREMENTER);
+}
+
+int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu)
+{
+	return test_bit(BOOK3S_INTERRUPT_DECREMENTER >> 7, &vcpu->arch.pending_exceptions);
+}
+
+void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
+                                struct kvm_interrupt *irq)
+{
+	kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_EXTERNAL);
+}
+
+int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority)
+{
+	int deliver = 1;
+	int vec = 0;
+
+	switch (priority) {
+	case BOOK3S_IRQPRIO_DECREMENTER:
+		deliver = vcpu->arch.msr & MSR_EE;
+		vec = BOOK3S_INTERRUPT_DECREMENTER;
+		break;
+	case BOOK3S_IRQPRIO_EXTERNAL:
+		deliver = vcpu->arch.msr & MSR_EE;
+		vec = BOOK3S_INTERRUPT_EXTERNAL;
+		break;
+	case BOOK3S_IRQPRIO_SYSTEM_RESET:
+		vec = BOOK3S_INTERRUPT_SYSTEM_RESET;
+		break;
+	case BOOK3S_IRQPRIO_MACHINE_CHECK:
+		vec = BOOK3S_INTERRUPT_MACHINE_CHECK;
+		break;
+	case BOOK3S_IRQPRIO_DATA_STORAGE:
+		vec = BOOK3S_INTERRUPT_DATA_STORAGE;
+		break;
+	case BOOK3S_IRQPRIO_INST_STORAGE:
+		vec = BOOK3S_INTERRUPT_INST_STORAGE;
+		break;
+	case BOOK3S_IRQPRIO_DATA_SEGMENT:
+		vec = BOOK3S_INTERRUPT_DATA_SEGMENT;
+		break;
+	case BOOK3S_IRQPRIO_INST_SEGMENT:
+		vec = BOOK3S_INTERRUPT_INST_SEGMENT;
+		break;
+	case BOOK3S_IRQPRIO_ALIGNMENT:
+		vec = BOOK3S_INTERRUPT_ALIGNMENT;
+		break;
+	case BOOK3S_IRQPRIO_PROGRAM:
+		vec = BOOK3S_INTERRUPT_PROGRAM;
+		break;
+	case BOOK3S_IRQPRIO_VSX:
+		vec = BOOK3S_INTERRUPT_VSX;
+		break;
+	case BOOK3S_IRQPRIO_ALTIVEC:
+		vec = BOOK3S_INTERRUPT_ALTIVEC;
+		break;
+	case BOOK3S_IRQPRIO_FP_UNAVAIL:
+		vec = BOOK3S_INTERRUPT_FP_UNAVAIL;
+		break;
+	case BOOK3S_IRQPRIO_SYSCALL:
+		vec = BOOK3S_INTERRUPT_SYSCALL;
+		break;
+	case BOOK3S_IRQPRIO_DEBUG:
+		vec = BOOK3S_INTERRUPT_TRACE;
+		break;
+	case BOOK3S_IRQPRIO_PERFORMANCE_MONITOR:
+		vec = BOOK3S_INTERRUPT_PERFMON;
+		break;
+	default:
+		deliver = 0;
+		printk(KERN_ERR "KVM: Unknown interrupt: 0x%x\n", priority);
+		break;
+	}
+
+#if 0
+	printk(KERN_INFO "Deliver interrupt 0x%x? %x\n", vec, deliver);
+#endif
+
+	if (deliver)
+		kvmppc_inject_interrupt(vcpu, vec, 0ULL);
+
+	return deliver;
+}
+
+void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
+{
+	unsigned long *pending = &vcpu->arch.pending_exceptions;
+	unsigned int priority;
+
+	/* XXX be more clever here - no need to mftb() on every entry */
+	/* Issue DEC again if it's still active */
+#ifdef AGGRESSIVE_DEC
+	if (vcpu->arch.msr & MSR_EE)
+		if (kvmppc_get_dec(vcpu) & 0x80000000)
+			kvmppc_core_queue_dec(vcpu);
+#endif
+
+#ifdef EXIT_DEBUG
+	if (vcpu->arch.pending_exceptions)
+		printk(KERN_EMERG "KVM: Check pending: %lx\n", vcpu->arch.pending_exceptions);
+#endif
+	priority = __ffs(*pending);
+	while (priority <= (sizeof(unsigned int) * 8)) {
+		if (kvmppc_book3s_irqprio_deliver(vcpu, priority)) {
+			clear_bit(priority, &vcpu->arch.pending_exceptions);
+			break;
+		}
+
+		priority = find_next_bit(pending,
+					 BITS_PER_BYTE * sizeof(*pending),
+					 priority + 1);
+	}
+}
+
+void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr)
+{
+	vcpu->arch.pvr = pvr;
+	if ((pvr >= 0x330000) && (pvr < 0x70330000)) {
+		kvmppc_mmu_book3s_64_init(vcpu);
+		to_book3s(vcpu)->hior = 0xfff00000;
+		to_book3s(vcpu)->msr_mask = 0xffffffffffffffffULL;
+	} else {
+		kvmppc_mmu_book3s_32_init(vcpu);
+		to_book3s(vcpu)->hior = 0;
+		to_book3s(vcpu)->msr_mask = 0xffffffffULL;
+	}
+
+	/* If we are in hypervisor level on 970, we can tell the CPU to
+	 * treat DCBZ as 32 bytes store */
+	vcpu->arch.hflags &= ~BOOK3S_HFLAG_DCBZ32;
+	if (vcpu->arch.mmu.is_dcbz32(vcpu) && (mfmsr() & MSR_HV) &&
+	    !strcmp(cur_cpu_spec->platform, "ppc970"))
+		vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
+
+}
+
+/* Book3s_32 CPUs always have 32 bytes cache line size, which Linux assumes. To
+ * make Book3s_32 Linux work on Book3s_64, we have to make sure we trap dcbz to
+ * emulate 32 bytes dcbz length.
+ *
+ * The Book3s_64 inventors also realized this case and implemented a special bit
+ * in the HID5 register, which is a hypervisor ressource. Thus we can't use it.
+ *
+ * My approach here is to patch the dcbz instruction on executing pages.
+ */
+static void kvmppc_patch_dcbz(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte)
+{
+	bool touched = false;
+	hva_t hpage;
+	u32 *page;
+	int i;
+
+	hpage = gfn_to_hva(vcpu->kvm, pte->raddr >> PAGE_SHIFT);
+	if (kvm_is_error_hva(hpage))
+		return;
+
+	hpage |= pte->raddr & ~PAGE_MASK;
+	hpage &= ~0xFFFULL;
+
+	page = vmalloc(HW_PAGE_SIZE);
+
+	if (copy_from_user(page, (void __user *)hpage, HW_PAGE_SIZE))
+		goto out;
+
+	for (i=0; i < HW_PAGE_SIZE / 4; i++)
+		if ((page[i] & 0xff0007ff) = INS_DCBZ) {
+			page[i] &= 0xfffffff7; // reserved instruction, so we trap
+			touched = true;
+		}
+
+	if (touched)
+		copy_to_user((void __user *)hpage, page, HW_PAGE_SIZE);
+
+out:
+	vfree(page);
+}
+
+static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, bool data,
+			 struct kvmppc_pte *pte)
+{
+	int relocated = (vcpu->arch.msr & (data ? MSR_DR : MSR_IR));
+	int r;
+
+	if (relocated) {
+		r = vcpu->arch.mmu.xlate(vcpu, eaddr, pte, data);
+	} else {
+		pte->eaddr = eaddr;
+		pte->raddr = eaddr & 0xffffffff;
+		pte->vpage = eaddr >> 12;
+		switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+		case 0:
+			pte->vpage |= VSID_REAL;
+		case MSR_DR:
+			pte->vpage |= VSID_REAL_DR;
+		case MSR_IR:
+			pte->vpage |= VSID_REAL_IR;
+		}
+		pte->may_read = true;
+		pte->may_write = true;
+		pte->may_execute = true;
+		r = 0;
+	}
+
+	return r;
+}
+
+static hva_t kvmppc_bad_hva(void)
+{
+	return PAGE_OFFSET;
+}
+
+static hva_t kvmppc_pte_to_hva(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte,
+			       bool read)
+{
+	hva_t hpage;
+
+	if (read && !pte->may_read)
+		goto err;
+
+	if (!read && !pte->may_write)
+		goto err;
+
+	hpage = gfn_to_hva(vcpu->kvm, pte->raddr >> PAGE_SHIFT);
+	if (kvm_is_error_hva(hpage))
+		goto err;
+
+	return hpage | (pte->raddr & ~PAGE_MASK);
+err:
+	return kvmppc_bad_hva();
+}
+
+int kvmppc_st(struct kvm_vcpu *vcpu, ulong eaddr, int size, void *ptr)
+{
+	struct kvmppc_pte pte;
+	hva_t hva = eaddr;
+
+	vcpu->stat.st++;
+
+	if (kvmppc_xlate(vcpu, eaddr, false, &pte))
+		goto err;
+
+	hva = kvmppc_pte_to_hva(vcpu, &pte, false);
+	if (kvm_is_error_hva(hva))
+		goto err;
+
+	if (copy_to_user((void __user *)hva, ptr, size)) {
+		printk(KERN_INFO "kvmppc_st at 0x%lx failed\n", hva);
+		goto err;
+	}
+
+	return 0;
+
+err:
+	return -ENOENT;
+}
+
+int kvmppc_ld(struct kvm_vcpu *vcpu, ulong eaddr, int size, void *ptr,
+		      bool data)
+{
+	struct kvmppc_pte pte;
+	hva_t hva = eaddr;
+
+	vcpu->stat.ld++;
+
+	if (kvmppc_xlate(vcpu, eaddr, data, &pte))
+		goto err;
+
+	hva = kvmppc_pte_to_hva(vcpu, &pte, true);
+	if (kvm_is_error_hva(hva))
+		goto err;
+
+	if (copy_from_user(ptr, (void __user *)hva, size)) {
+		printk(KERN_INFO "kvmppc_ld at 0x%lx failed\n", hva);
+		goto err;
+	}
+
+	return 0;
+
+err:
+	return -ENOENT;
+}
+
+static int kvmppc_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+	return kvm_is_visible_gfn(vcpu->kvm, gfn);
+}
+
+int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu,
+			    ulong eaddr, int vec)
+{
+	bool data = (vec = BOOK3S_INTERRUPT_DATA_STORAGE);
+	int r = RESUME_GUEST;
+	int relocated;
+	int page_found = 0;
+	struct kvmppc_pte pte;
+	bool is_mmio = false;
+
+	if ( vec = BOOK3S_INTERRUPT_DATA_STORAGE ) {
+		relocated = (vcpu->arch.msr & MSR_DR);
+	} else {
+		relocated = (vcpu->arch.msr & MSR_IR);
+	}
+
+	/* Resolve real address if translation turned on */
+	if (relocated) {
+		page_found = vcpu->arch.mmu.xlate(vcpu, eaddr, &pte, data);
+	} else {
+		pte.may_execute = true;
+		pte.may_read = true;
+		pte.may_write = true;
+		pte.raddr = eaddr & 0xffffffff;
+		pte.eaddr = eaddr;
+		pte.vpage = eaddr >> 12;
+		switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+		case 0:
+			pte.vpage |= VSID_REAL;
+		case MSR_DR:
+			pte.vpage |= VSID_REAL_DR;
+		case MSR_IR:
+			pte.vpage |= VSID_REAL_IR;
+		}
+	}
+
+	if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+	   (!(vcpu->arch.hflags & BOOK3S_HFLAG_DCBZ32))) {
+		/*
+		 * If we do the dcbz hack, we have to NX on every execution,
+		 * so we can patch the executing code. This renders our guest
+		 * NX-less.
+		 */
+		pte.may_execute = !data;
+	}
+
+	if (page_found = -ENOENT) {
+		/* Page not found in guest PTE entries */
+		vcpu->arch.dear = vcpu->arch.fault_dear;
+		to_book3s(vcpu)->dsisr = vcpu->arch.fault_dsisr;
+		vcpu->arch.msr |= (vcpu->arch.shadow_msr & 0x00000000f8000000ULL);
+		kvmppc_book3s_queue_irqprio(vcpu, vec);
+	} else if (page_found = -EPERM) {
+		/* Storage protection */
+		vcpu->arch.dear = vcpu->arch.fault_dear;
+		to_book3s(vcpu)->dsisr = vcpu->arch.fault_dsisr & ~DSISR_NOHPTE;
+		to_book3s(vcpu)->dsisr |= DSISR_PROTFAULT;
+		vcpu->arch.msr |= (vcpu->arch.shadow_msr & 0x00000000f8000000ULL);
+		kvmppc_book3s_queue_irqprio(vcpu, vec);
+	} else if (page_found = -EINVAL) {
+		/* Page not found in guest SLB */
+		vcpu->arch.dear = vcpu->arch.fault_dear;
+		kvmppc_book3s_queue_irqprio(vcpu, vec + 0x80);
+	} else if (!is_mmio &&
+		   kvmppc_visible_gfn(vcpu, pte.raddr >> PAGE_SHIFT)) {
+		/* The guest's PTE is not mapped yet. Map on the host */
+		kvmppc_mmu_map_page(vcpu, &pte);
+		if (data)
+			vcpu->stat.sp_storage++;
+		else if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+			(!(vcpu->arch.hflags & BOOK3S_HFLAG_DCBZ32)))
+			kvmppc_patch_dcbz(vcpu, &pte);
+	} else {
+		/* MMIO */
+		vcpu->stat.mmio_exits++;
+		vcpu->arch.paddr_accessed = pte.raddr;
+		r = kvmppc_emulate_mmio(run, vcpu);
+		if ( r = RESUME_HOST_NV )
+			r = RESUME_HOST;
+		if ( r = RESUME_GUEST_NV )
+			r = RESUME_GUEST;
+	}
+
+	return r;
+}
+
+int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
+                       unsigned int exit_nr)
+{
+	int r = RESUME_HOST;
+
+	vcpu->stat.sum_exits++;
+
+	run->exit_reason = KVM_EXIT_UNKNOWN;
+	run->ready_for_interrupt_injection = 1;
+#ifdef EXIT_DEBUG
+	printk(KERN_EMERG "exit_nr=0x%x | pc=0x%lx | dar=0x%lx | dec=0x%x | msr=0x%lx\n",
+		exit_nr, vcpu->arch.pc, vcpu->arch.fault_dear,
+		kvmppc_get_dec(vcpu), vcpu->arch.msr);
+#elif defined (EXIT_DEBUG_SIMPLE)
+	if ((exit_nr != 0x900) && (exit_nr != 0x500))
+		printk(KERN_EMERG "exit_nr=0x%x | pc=0x%lx | dar=0x%lx | msr=0x%lx\n",
+			exit_nr, vcpu->arch.pc, vcpu->arch.fault_dear,
+			vcpu->arch.msr);
+#endif
+	kvm_resched(vcpu);
+	switch (exit_nr) {
+	case BOOK3S_INTERRUPT_INST_STORAGE:
+		vcpu->stat.pf_instruc++;
+		/* only care about PTEG not found errors, but leave NX alone */
+		if (vcpu->arch.shadow_msr & 0x40000000) {
+			r = kvmppc_handle_pagefault(run, vcpu, vcpu->arch.pc, exit_nr);
+			vcpu->stat.sp_instruc++;
+		} else if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+			  (!(vcpu->arch.hflags & BOOK3S_HFLAG_DCBZ32))) {
+			/*
+			 * XXX If we do the dcbz hack we use the NX bit to flush&patch the page,
+			 *     so we can't use the NX bit inside the guest. Let's cross our fingers,
+			 *     that no guest that needs the dcbz hack does NX.
+			 */
+			kvmppc_mmu_pte_flush(vcpu, vcpu->arch.pc, ~0xFFFULL);
+		} else {
+			vcpu->arch.msr |= (vcpu->arch.shadow_msr & 0x58000000);
+			kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+			kvmppc_mmu_pte_flush(vcpu, vcpu->arch.pc, ~0xFFFULL);
+			r = RESUME_GUEST;
+		}
+		break;
+	case BOOK3S_INTERRUPT_DATA_STORAGE:
+		vcpu->stat.pf_storage++;
+		/* The only case we need to handle is missing shadow PTEs */
+		if (vcpu->arch.fault_dsisr & DSISR_NOHPTE) {
+			r = kvmppc_handle_pagefault(run, vcpu, vcpu->arch.fault_dear, exit_nr);
+		} else {
+			vcpu->arch.dear = vcpu->arch.fault_dear;
+			to_book3s(vcpu)->dsisr = vcpu->arch.fault_dsisr;
+			kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+			kvmppc_mmu_pte_flush(vcpu, vcpu->arch.dear, ~0xFFFULL);
+			r = RESUME_GUEST;
+		}
+		break;
+	case BOOK3S_INTERRUPT_DATA_SEGMENT:
+		if (kvmppc_mmu_map_segment(vcpu, vcpu->arch.fault_dear) < 0) {
+			vcpu->arch.dear = vcpu->arch.fault_dear;
+			kvmppc_book3s_queue_irqprio(vcpu,
+				BOOK3S_INTERRUPT_DATA_SEGMENT);
+		}
+		r = RESUME_GUEST;
+		break;
+	case BOOK3S_INTERRUPT_INST_SEGMENT:
+		if (kvmppc_mmu_map_segment(vcpu, vcpu->arch.pc) < 0) {
+			kvmppc_book3s_queue_irqprio(vcpu,
+				BOOK3S_INTERRUPT_INST_SEGMENT);
+		}
+		r = RESUME_GUEST;
+		break;
+	/* We're good on these - the host merely wanted to get our attention */
+	case BOOK3S_INTERRUPT_DECREMENTER:
+		vcpu->stat.dec_exits++;
+		r = RESUME_GUEST;
+		break;
+	case BOOK3S_INTERRUPT_EXTERNAL:
+		vcpu->stat.ext_intr_exits++;
+		r = RESUME_GUEST;
+		break;
+	case BOOK3S_INTERRUPT_PROGRAM:
+	{
+		enum emulation_result er;
+
+		if (vcpu->arch.msr & MSR_PR) {
+#ifdef EXIT_DEBUG
+			printk(KERN_INFO "Userspace triggered 0x700 exception at 0x%lx (0x%x)\n", vcpu->arch.pc, vcpu->arch.last_inst);
+#endif
+			if ((vcpu->arch.last_inst & 0xff0007ff) !+			    (INS_DCBZ & 0xfffffff7)) {
+				kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+				r = RESUME_GUEST;
+				break;
+			}
+		}
+
+		vcpu->stat.emulated_inst_exits++;
+		er = kvmppc_emulate_instruction(run, vcpu);
+		switch (er) {
+		case EMULATE_DONE:
+			r = RESUME_GUEST;
+			break;
+		case EMULATE_FAIL:
+			printk(KERN_CRIT "%s: emulation at %lx failed (%08x)\n",
+			       __func__, vcpu->arch.pc, vcpu->arch.last_inst);
+			kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+			r = RESUME_GUEST;
+			break;
+		default:
+			BUG();
+		}
+		break;
+	}
+	case BOOK3S_INTERRUPT_SYSCALL:
+#ifdef EXIT_DEBUG
+		printk(KERN_INFO "Syscall Nr %d\n", (int)vcpu->arch.gpr[0]);
+#endif
+		vcpu->stat.syscall_exits++;
+		kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+		r = RESUME_GUEST;
+		break;
+	case BOOK3S_INTERRUPT_MACHINE_CHECK:
+	case BOOK3S_INTERRUPT_FP_UNAVAIL:
+	case BOOK3S_INTERRUPT_TRACE:
+	case BOOK3S_INTERRUPT_ALTIVEC:
+	case BOOK3S_INTERRUPT_VSX:
+		kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+		r = RESUME_GUEST;
+		break;
+	default:
+		/* Ugh - bork here! What did we get? */
+		printk(KERN_EMERG "exit_nr=0x%x | pc=0x%lx | msr=0x%lx\n", exit_nr, vcpu->arch.pc, vcpu->arch.shadow_msr);
+		r = RESUME_HOST;
+		BUG();
+		break;
+	}
+
+
+	if (!(r & RESUME_HOST)) {
+		/* To avoid clobbering exit_reason, only check for signals if
+		 * we aren't already exiting to userspace for some other
+		 * reason. */
+		if (signal_pending(current)) {
+#ifdef EXIT_DEBUG
+			printk(KERN_EMERG "KVM: Going back to host\n");
+#endif
+			vcpu->stat.signal_exits++;
+			run->exit_reason = KVM_EXIT_INTR;
+			r = -EINTR;
+		} else {
+			/* In case an interrupt came in that was triggered
+			 * from userspace (like DEC), we need to check what
+			 * to inject now! */
+			kvmppc_core_deliver_interrupts(vcpu);
+		}
+	}
+
+#ifdef EXIT_DEBUG
+	printk(KERN_EMERG "KVM exit: vcpu=0x%p pc=0x%lx r=0x%x\n", vcpu, vcpu->arch.pc, r);
+#endif
+
+	return r;
+}
+
+int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	int i;
+
+	regs->pc = vcpu->arch.pc;
+	regs->cr = vcpu->arch.cr;
+	regs->ctr = vcpu->arch.ctr;
+	regs->lr = vcpu->arch.lr;
+	regs->xer = vcpu->arch.xer;
+	regs->msr = vcpu->arch.msr;
+	regs->srr0 = vcpu->arch.srr0;
+	regs->srr1 = vcpu->arch.srr1;
+	regs->pid = vcpu->arch.pid;
+	regs->sprg0 = vcpu->arch.sprg0;
+	regs->sprg1 = vcpu->arch.sprg1;
+	regs->sprg2 = vcpu->arch.sprg2;
+	regs->sprg3 = vcpu->arch.sprg3;
+	regs->sprg5 = vcpu->arch.sprg4;
+	regs->sprg6 = vcpu->arch.sprg5;
+	regs->sprg7 = vcpu->arch.sprg6;
+
+	for (i = 0; i < ARRAY_SIZE(regs->gpr); i++)
+		regs->gpr[i] = vcpu->arch.gpr[i];
+
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	int i;
+
+	vcpu->arch.pc = regs->pc;
+	vcpu->arch.cr = regs->cr;
+	vcpu->arch.ctr = regs->ctr;
+	vcpu->arch.lr = regs->lr;
+	vcpu->arch.xer = regs->xer;
+	kvmppc_set_msr(vcpu, regs->msr);
+	vcpu->arch.srr0 = regs->srr0;
+	vcpu->arch.srr1 = regs->srr1;
+	vcpu->arch.sprg0 = regs->sprg0;
+	vcpu->arch.sprg1 = regs->sprg1;
+	vcpu->arch.sprg2 = regs->sprg2;
+	vcpu->arch.sprg3 = regs->sprg3;
+	vcpu->arch.sprg5 = regs->sprg4;
+	vcpu->arch.sprg6 = regs->sprg5;
+	vcpu->arch.sprg7 = regs->sprg6;
+
+	for (i = 0; i < ARRAY_SIZE(vcpu->arch.gpr); i++)
+		vcpu->arch.gpr[i] = regs->gpr[i];
+
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+                                  struct kvm_sregs *sregs)
+{
+	sregs->pvr = vcpu->arch.pvr;
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+                                  struct kvm_sregs *sregs)
+{
+	kvmppc_set_pvr(vcpu, sregs->pvr);
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -ENOTSUPP;
+}
+
+int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -ENOTSUPP;
+}
+
+int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
+                                  struct kvm_translation *tr)
+{
+	return 0;
+}
+
+/*
+ * Get (and clear) the dirty memory log for a memory slot.
+ */
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
+				      struct kvm_dirty_log *log)
+{
+	int r;
+	int n;
+	struct kvm_memory_slot *memslot;
+	int is_dirty = 0;
+
+	down_write(&kvm->slots_lock);
+
+	r = kvm_get_dirty_log(kvm, log, &is_dirty);
+	if (r)
+		goto out;
+
+	/* If nothing is dirty, don't bother messing with page tables. */
+	if (is_dirty) {
+		memslot = &kvm->memslots[log->slot];
+		for (n = 0; n < atomic_read(&kvm->online_vcpus); n++) {
+			ulong ga = memslot->base_gfn << PAGE_SHIFT;
+			ulong ga_end = ga + (memslot->npages << PAGE_SHIFT);
+
+			kvmppc_mmu_pte_pflush(kvm->vcpus[n], ga, ga_end);
+		}
+		n = ALIGN(memslot->npages, BITS_PER_LONG) / 8;
+		memset(memslot->dirty_bitmap, 0, n);
+	}
+
+	r = 0;
+out:
+	up_write(&kvm->slots_lock);
+	return r;
+}
+
+int kvmppc_core_check_processor_compat(void)
+{
+	return 0;
+}
+
+struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s;
+	struct kvm_vcpu *vcpu;
+	int err;
+
+	vcpu_book3s = (struct kvmppc_vcpu_book3s *)__get_free_pages( GFP_KERNEL | __GFP_ZERO,
+			get_order(sizeof(struct kvmppc_vcpu_book3s)));
+	if (!vcpu_book3s) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	vcpu = &vcpu_book3s->vcpu;
+	err = kvm_vcpu_init(vcpu, kvm, id);
+	if (err)
+		goto free_vcpu;
+
+	vcpu->arch.host_retip = kvm_return_point;
+	vcpu->arch.host_msr = mfmsr();
+	/* default to book3s_64 (970fx) */
+	vcpu->arch.pvr = 0x3C0301;
+	kvmppc_set_pvr(vcpu, vcpu->arch.pvr);
+	vcpu_book3s->slb_nr = 64;
+
+	/* remember where some real-mode handlers are */
+	vcpu->arch.trampoline_lowmem = kvmppc_trampoline_lowmem;
+	vcpu->arch.trampoline_enter = kvmppc_trampoline_enter;
+	vcpu->arch.highmem_handler = (ulong)kvmppc_handler_highmem;
+
+	vcpu->arch.shadow_msr = MSR_USER64;
+
+	err = __init_new_context();
+	if (err < 0)
+		goto free_vcpu;
+	vcpu_book3s->context_id = err;
+
+	vcpu_book3s->vsid_max = ((vcpu_book3s->context_id + 1) << USER_ESID_BITS) - 1;
+	vcpu_book3s->vsid_first = vcpu_book3s->context_id << USER_ESID_BITS;
+	vcpu_book3s->vsid_next = vcpu_book3s->vsid_first;
+
+	return vcpu;
+
+free_vcpu:
+	free_pages((long)vcpu_book3s, get_order(sizeof(struct kvmppc_vcpu_book3s)));
+out:
+	return ERR_PTR(err);
+}
+
+void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+
+	__destroy_context(vcpu_book3s->context_id);
+	kvm_vcpu_uninit(vcpu);
+	free_pages((long)vcpu_book3s, get_order(sizeof(struct kvmppc_vcpu_book3s)));
+}
+
+extern int __kvmppc_vcpu_entry(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
+int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
+{
+	int ret;
+
+	/* No need to go into the guest when all we do is going out */
+	if (signal_pending(current)) {
+		kvm_run->exit_reason = KVM_EXIT_INTR;
+		return -EINTR;
+	}
+
+	/* XXX we get called with irq disabled - change that! */
+	local_irq_enable();
+
+	ret = __kvmppc_vcpu_entry(kvm_run, vcpu);
+
+	local_irq_disable();
+
+	return ret;
+}
+
+static int kvmppc_book3s_init(void)
+{
+	return kvm_init(NULL, sizeof(struct kvmppc_vcpu_book3s), THIS_MODULE);
+}
+
+static void kvmppc_book3s_exit(void)
+{
+	kvm_exit();
+}
+
+module_init(kvmppc_book3s_init);
+module_exit(kvmppc_book3s_exit);
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 11/27] Add book3s_64 Host MMU handling
  2009-10-21 15:03                             ` Alexander Graf
@ 2009-10-21 15:03                               ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

We designed the Book3S port of KVM as modular as possible. Most
of the code could be easily used on a Book3S_32 host as well.

The main difference between 32 and 64 bit cores is the MMU. To keep
things well separated, we treat the book3s_64 MMU as one possible compile
option.

This patch adds all the MMU helpers the rest of the code needs in
order to modify the host's MMU, like setting PTEs and segments.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s_64_mmu_host.c |  412 +++++++++++++++++++++++++++++++++
 1 files changed, 412 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu_host.c

diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
new file mode 100644
index 0000000..507f770
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -0,0 +1,412 @@
+/*
+ * Copyright (C) 2009 SUSE Linux Products GmbH. All rights reserved.
+ *
+ * Authors:
+ *     Alexander Graf <agraf@suse.de>
+ *     Kevin Wolf <mail@kevin-wolf.de>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+#include <asm/mmu-hash64.h>
+#include <asm/machdep.h>
+#include <asm/mmu_context.h>
+#include <asm/hw_irq.h>
+
+#define PTE_SIZE 12
+#define VSID_ALL 0
+
+// #define DEBUG_MMU
+// #define DEBUG_SLB
+
+void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, u64 guest_ea, u64 ea_mask)
+{
+	int i;
+
+#ifdef DEBUG_MMU
+	printk(KERN_INFO "KVM: Flushing %d Shadow PTEs: 0x%llx & 0x%llx\n",
+		vcpu->arch.hpte_cache_offset, guest_ea, ea_mask);
+#endif
+	BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
+	guest_ea &= ea_mask;
+	for (i=0; i<vcpu->arch.hpte_cache_offset; i++) {
+		struct hpte_cache *pte;
+
+		pte = &vcpu->arch.hpte_cache[i];
+		if (!pte->host_va)
+			continue;
+
+		if ((pte->pte.eaddr & ea_mask) == guest_ea) {
+#ifdef DEBUG_MMU
+	printk(KERN_INFO "KVM: Flushing SPT %d: 0x%llx (0x%llx) -> 0x%llx\n", i, pte->pte.eaddr, pte->pte.vpage, pte->host_va);
+#endif
+			ppc_md.hpte_invalidate(pte->slot, pte->host_va,
+					       MMU_PAGE_4K, MMU_SEGSIZE_256M,
+					       false);
+			pte->host_va = 0;
+			kvm_release_pfn_dirty(pte->pfn);
+		}
+	}
+
+	/* Doing a complete flush -> start from scratch */
+	if (!ea_mask)
+		vcpu->arch.hpte_cache_offset = 0;
+}
+
+void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 guest_vp, u64 vp_mask)
+{
+	int i;
+
+#ifdef DEBUG_MMU
+	printk(KERN_INFO "KVM: Flushing %d Shadow vPTEs: 0x%llx & 0x%llx\n",
+		vcpu->arch.hpte_cache_offset, guest_vp, vp_mask);
+#endif
+	BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
+	guest_vp &= vp_mask;
+	for (i=0; i<vcpu->arch.hpte_cache_offset; i++) {
+		struct hpte_cache *pte;
+
+		pte = &vcpu->arch.hpte_cache[i];
+		if (!pte->host_va)
+			continue;
+
+		if ((pte->pte.vpage & vp_mask) == guest_vp) {
+#ifdef DEBUG_MMU
+	printk(KERN_INFO "KVM: Flushing SPT %d: 0x%llx (0x%llx) -> 0x%llx\n", i, pte->pte.eaddr, pte->pte.vpage, pte->host_va);
+#endif
+			ppc_md.hpte_invalidate(pte->slot, pte->host_va,
+					       MMU_PAGE_4K, MMU_SEGSIZE_256M,
+					       false);
+			pte->host_va = 0;
+			kvm_release_pfn_dirty(pte->pfn);
+		}
+	}
+}
+
+void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, u64 pa_start, u64 pa_end)
+{
+	int i;
+
+#ifdef DEBUG_MMU
+	printk(KERN_INFO "KVM: Flushing %d Shadow pPTEs: 0x%llx & 0x%llx\n",
+		vcpu->arch.hpte_cache_offset, guest_pa, pa_mask);
+#endif
+	BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
+
+	for (i=0; i<vcpu->arch.hpte_cache_offset; i++) {
+		struct hpte_cache *pte;
+
+		pte = &vcpu->arch.hpte_cache[i];
+		if (!pte->host_va)
+			continue;
+
+		if ((pte->pte.raddr >= pa_start) && (pte->pte.raddr < pa_end)) {
+#ifdef DEBUG_MMU
+	printk(KERN_INFO "KVM: Flushing SPT %d: 0x%llx (0x%llx) -> 0x%llx\n", i, pte->pte.eaddr, pte->pte.raddr, pte->host_va);
+#endif
+			ppc_md.hpte_invalidate(pte->slot, pte->host_va,
+					       MMU_PAGE_4K, MMU_SEGSIZE_256M,
+					       false);
+			pte->host_va = 0;
+			kvm_release_pfn_dirty(pte->pfn);
+		}
+	}
+}
+
+struct kvmppc_pte *kvmppc_mmu_find_pte(struct kvm_vcpu *vcpu, u64 ea, bool data)
+{
+	int i;
+	u64 guest_vp;
+
+	guest_vp = vcpu->arch.mmu.ea_to_vp(vcpu, ea, false);
+	for (i=0; i<vcpu->arch.hpte_cache_offset; i++) {
+		struct hpte_cache *pte;
+
+		pte = &vcpu->arch.hpte_cache[i];
+		if (!pte->host_va)
+			continue;
+
+		if (pte->pte.vpage == guest_vp)
+			return &pte->pte;
+	}
+
+	return NULL;
+}
+
+static int kvmppc_mmu_hpte_cache_next(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.hpte_cache_offset == HPTEG_CACHE_NUM)
+		kvmppc_mmu_pte_flush(vcpu, 0, 0);
+
+	return vcpu->arch.hpte_cache_offset++;
+}
+
+/* We keep 512 gvsid->hvsid entries, mapping the guest ones to the array using
+ * a hash, so we don't waste cycles on looping */
+static u16 kvmppc_sid_hash(struct kvm_vcpu *vcpu, u64 gvsid)
+{
+	return (u16)(((gvsid >> (SID_MAP_BITS * 7)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 6)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 5)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 4)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 3)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 2)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 1)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 0)) & SID_MAP_MASK));
+}
+
+
+static struct kvmppc_sid_map *find_sid_vsid(struct kvm_vcpu *vcpu, u64 gvsid)
+{
+	struct kvmppc_sid_map *map;
+
+	if (vcpu->arch.msr & MSR_PR)
+		gvsid |= VSID_PR;
+
+	map = &to_book3s(vcpu)->sid_map[kvmppc_sid_hash(vcpu, gvsid)];
+	if (map->guest_vsid == gvsid) {
+#ifdef DEBUG_SLB
+		printk(KERN_INFO "SLB: Searching 0x%llx -> 0x%llx\n", gvsid, map->host_vsid);
+#endif
+		return map;
+	}
+
+	map = &to_book3s(vcpu)->sid_map[SID_MAP_MASK - kvmppc_sid_hash(vcpu, gvsid)];
+	if (map->guest_vsid == gvsid) {
+#ifdef DEBUG_SLB
+		printk(KERN_INFO "SLB: Searching 0x%llx -> 0x%llx\n", gvsid, map->host_vsid);
+#endif
+		return map;
+	}
+
+#ifdef DEBUG_SLB
+	printk(KERN_INFO "SLB: Searching 0x%llx -> not found\n", gvsid);
+#endif
+	return NULL;
+}
+
+int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte)
+{
+	pfn_t hpaddr;
+	ulong hash, hpteg, va;
+	u64 vsid;
+	int ret;
+	int rflags = 0x192;
+	int vflags = 0;
+	int attempt = 0;
+	struct kvmppc_sid_map *map;
+
+	/* Get host physical address for gpa */
+	down_read(&current->mm->mmap_sem);
+	hpaddr = gfn_to_pfn(vcpu->kvm, orig_pte->raddr >> PAGE_SHIFT);
+	if (kvm_is_error_hva(hpaddr)) {
+		printk(KERN_INFO "Couldn't get guest page for gfn %llx!\n", orig_pte->eaddr);
+		up_read(&current->mm->mmap_sem);
+		return -EINVAL;
+	}
+	hpaddr <<= PAGE_SHIFT;
+#if PAGE_SHIFT == 12
+#elif PAGE_SHIFT == 16
+	hpaddr |= orig_pte->raddr & 0xf000;
+#else
+#error Unknown page size
+#endif
+
+	up_read(&current->mm->mmap_sem);
+
+	/* and write the mapping ea -> hpa into the pt */
+	vcpu->arch.mmu.esid_to_vsid(vcpu, orig_pte->eaddr >> SID_SHIFT, &vsid);
+	map = find_sid_vsid(vcpu, vsid);
+	if (!map) {
+		kvmppc_mmu_map_segment(vcpu, orig_pte->eaddr);
+		map = find_sid_vsid(vcpu, vsid);
+	}
+	BUG_ON(!map);
+
+	vsid = map->host_vsid;
+	va = hpt_va(orig_pte->eaddr, vsid, MMU_SEGSIZE_256M);
+
+	if (!orig_pte->may_write)
+		rflags |= HPTE_R_PP;
+	else
+		mark_page_dirty(vcpu->kvm, orig_pte->raddr >> PAGE_SHIFT);
+
+	if (!orig_pte->may_execute)
+		rflags |= HPTE_R_N;
+
+	hash = hpt_hash(va, PTE_SIZE, MMU_SEGSIZE_256M);
+
+map_again:
+	hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP);
+
+	/* In case we tried normal mapping already, let's nuke old entries */
+	if (attempt > 1)
+		if (ppc_md.hpte_remove(hpteg) < 0)
+			return -1;
+
+	ret = ppc_md.hpte_insert(hpteg, va, hpaddr, rflags, vflags, MMU_PAGE_4K, MMU_SEGSIZE_256M);
+
+	if (ret < 0) {
+		/* If we couldn't map a primary PTE, try a secondary */
+#ifdef USE_SECONDARY
+		hash = ~hash;
+		attempt++;
+		if (attempt % 2)
+			vflags = HPTE_V_SECONDARY;
+		else
+			vflags = 0;
+#else
+		attempt = 2;
+#endif
+		goto map_again;
+	} else {
+		int hpte_id = kvmppc_mmu_hpte_cache_next(vcpu);
+		struct hpte_cache *pte = &vcpu->arch.hpte_cache[hpte_id];
+#ifdef DEBUG_MMU
+		printk(KERN_INFO "KVM: %c%c Map 0x%llx: [%lx] 0x%lx (0x%llx) -> %lx\n",
+				 ((rflags & HPTE_R_PP) == 3) ? '-' : 'w',
+				 (rflags & HPTE_R_N) ? '-' : 'x',
+				 orig_pte->eaddr, hpteg, va, orig_pte->vpage,
+				 hpaddr);
+#endif
+		pte->slot = hpteg + (ret & 7);
+		pte->host_va = va;
+		pte->pte = *orig_pte;
+		pte->pfn = hpaddr >> PAGE_SHIFT;
+	}
+
+	return 0;
+}
+
+static struct kvmppc_sid_map *create_sid_map(struct kvm_vcpu *vcpu, u64 gvsid)
+{
+	struct kvmppc_sid_map *map;
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	static int backwards_map = 0;
+
+	if (vcpu->arch.msr & MSR_PR)
+		gvsid |= VSID_PR;
+
+	/* We might get collisions that trap in preceding order, so let's
+	   map them differently */
+	if (backwards_map)
+		map = &to_book3s(vcpu)->sid_map[SID_MAP_MASK - kvmppc_sid_hash(vcpu, gvsid)];
+	else
+		map = &to_book3s(vcpu)->sid_map[kvmppc_sid_hash(vcpu, gvsid)];
+	backwards_map = !backwards_map;
+
+	// Uh-oh ... out of mappings. Let's flush!
+	if (vcpu_book3s->vsid_next == vcpu_book3s->vsid_max) {
+		vcpu_book3s->vsid_next = vcpu_book3s->vsid_first;
+		memset(vcpu_book3s->sid_map, 0,
+		       sizeof(struct kvmppc_sid_map) * SID_MAP_NUM);
+		kvmppc_mmu_pte_flush(vcpu, 0, 0);
+		kvmppc_mmu_flush_segments(vcpu);
+	}
+	map->host_vsid = vcpu_book3s->vsid_next++;
+
+	map->guest_vsid = gvsid;
+	map->valid = true;
+
+	return map;
+}
+
+static int kvmppc_mmu_next_segment(struct kvm_vcpu *vcpu, ulong esid)
+{
+	int i;
+	int max_slb_size = 64;
+	int found_inval = -1;
+	int r;
+
+	if (!get_paca()->kvm_slb_max)
+		get_paca()->kvm_slb_max = 1;
+
+	/* Are we overwriting? */
+	for (i = 1; i < get_paca()->kvm_slb_max; i++) {
+		if (!(get_paca()->kvm_slb[i].esid & SLB_ESID_V))
+			found_inval = i;
+		else if ((get_paca()->kvm_slb[i].esid & ESID_MASK) == esid)
+			return i;
+	}
+
+	/* Found a spare entry that was invalidated before */
+	if (found_inval > 0)
+		return found_inval;
+
+	/* No spare invalid entry, so create one */
+
+	if (mmu_slb_size < 64)
+		max_slb_size = mmu_slb_size;
+
+	/* Overflowing -> purge */
+	if ((get_paca()->kvm_slb_max) == max_slb_size)
+		kvmppc_mmu_flush_segments(vcpu);
+
+	r = get_paca()->kvm_slb_max;
+	get_paca()->kvm_slb_max++;
+
+	return r;
+}
+
+int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr)
+{
+	u64 esid = eaddr >> SID_SHIFT;
+	u64 slb_esid = (eaddr & ESID_MASK) | SLB_ESID_V;
+	u64 slb_vsid = SLB_VSID_USER;
+	u64 gvsid;
+	int slb_index;
+	struct kvmppc_sid_map *map;
+
+	slb_index = kvmppc_mmu_next_segment(vcpu, eaddr & ESID_MASK);
+
+	if (vcpu->arch.mmu.esid_to_vsid(vcpu, esid, &gvsid)) {
+		/* Invalidate an entry */
+		get_paca()->kvm_slb[slb_index].esid = 0;
+		return -ENOENT;
+	}
+
+	map = find_sid_vsid(vcpu, gvsid);
+	if (!map)
+		map = create_sid_map(vcpu, gvsid);
+
+	map->guest_esid = esid;
+
+	slb_vsid |= (map->host_vsid << 12);
+	slb_vsid &= ~SLB_VSID_KP;
+	slb_esid |= slb_index;
+
+	get_paca()->kvm_slb[slb_index].esid = slb_esid;
+	get_paca()->kvm_slb[slb_index].vsid = slb_vsid;
+
+#ifdef DEBUG_SLB
+	printk(KERN_INFO "slbmte %#llx, %#llx\n", slb_vsid, slb_esid);
+#endif
+
+	return 0;
+}
+
+void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu)
+{
+	get_paca()->kvm_slb_max = 1;
+	get_paca()->kvm_slb[0].esid = 0;
+}
+
+void kvmppc_mmu_destroy(struct kvm_vcpu *vcpu)
+{
+	kvmppc_mmu_pte_flush(vcpu, 0, 0);
+}
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 11/27] Add book3s_64 Host MMU handling
@ 2009-10-21 15:03                               ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

We designed the Book3S port of KVM as modular as possible. Most
of the code could be easily used on a Book3S_32 host as well.

The main difference between 32 and 64 bit cores is the MMU. To keep
things well separated, we treat the book3s_64 MMU as one possible compile
option.

This patch adds all the MMU helpers the rest of the code needs in
order to modify the host's MMU, like setting PTEs and segments.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s_64_mmu_host.c |  412 +++++++++++++++++++++++++++++++++
 1 files changed, 412 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu_host.c

diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
new file mode 100644
index 0000000..507f770
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -0,0 +1,412 @@
+/*
+ * Copyright (C) 2009 SUSE Linux Products GmbH. All rights reserved.
+ *
+ * Authors:
+ *     Alexander Graf <agraf@suse.de>
+ *     Kevin Wolf <mail@kevin-wolf.de>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+#include <asm/mmu-hash64.h>
+#include <asm/machdep.h>
+#include <asm/mmu_context.h>
+#include <asm/hw_irq.h>
+
+#define PTE_SIZE 12
+#define VSID_ALL 0
+
+// #define DEBUG_MMU
+// #define DEBUG_SLB
+
+void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, u64 guest_ea, u64 ea_mask)
+{
+	int i;
+
+#ifdef DEBUG_MMU
+	printk(KERN_INFO "KVM: Flushing %d Shadow PTEs: 0x%llx & 0x%llx\n",
+		vcpu->arch.hpte_cache_offset, guest_ea, ea_mask);
+#endif
+	BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
+	guest_ea &= ea_mask;
+	for (i=0; i<vcpu->arch.hpte_cache_offset; i++) {
+		struct hpte_cache *pte;
+
+		pte = &vcpu->arch.hpte_cache[i];
+		if (!pte->host_va)
+			continue;
+
+		if ((pte->pte.eaddr & ea_mask) = guest_ea) {
+#ifdef DEBUG_MMU
+	printk(KERN_INFO "KVM: Flushing SPT %d: 0x%llx (0x%llx) -> 0x%llx\n", i, pte->pte.eaddr, pte->pte.vpage, pte->host_va);
+#endif
+			ppc_md.hpte_invalidate(pte->slot, pte->host_va,
+					       MMU_PAGE_4K, MMU_SEGSIZE_256M,
+					       false);
+			pte->host_va = 0;
+			kvm_release_pfn_dirty(pte->pfn);
+		}
+	}
+
+	/* Doing a complete flush -> start from scratch */
+	if (!ea_mask)
+		vcpu->arch.hpte_cache_offset = 0;
+}
+
+void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 guest_vp, u64 vp_mask)
+{
+	int i;
+
+#ifdef DEBUG_MMU
+	printk(KERN_INFO "KVM: Flushing %d Shadow vPTEs: 0x%llx & 0x%llx\n",
+		vcpu->arch.hpte_cache_offset, guest_vp, vp_mask);
+#endif
+	BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
+	guest_vp &= vp_mask;
+	for (i=0; i<vcpu->arch.hpte_cache_offset; i++) {
+		struct hpte_cache *pte;
+
+		pte = &vcpu->arch.hpte_cache[i];
+		if (!pte->host_va)
+			continue;
+
+		if ((pte->pte.vpage & vp_mask) = guest_vp) {
+#ifdef DEBUG_MMU
+	printk(KERN_INFO "KVM: Flushing SPT %d: 0x%llx (0x%llx) -> 0x%llx\n", i, pte->pte.eaddr, pte->pte.vpage, pte->host_va);
+#endif
+			ppc_md.hpte_invalidate(pte->slot, pte->host_va,
+					       MMU_PAGE_4K, MMU_SEGSIZE_256M,
+					       false);
+			pte->host_va = 0;
+			kvm_release_pfn_dirty(pte->pfn);
+		}
+	}
+}
+
+void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, u64 pa_start, u64 pa_end)
+{
+	int i;
+
+#ifdef DEBUG_MMU
+	printk(KERN_INFO "KVM: Flushing %d Shadow pPTEs: 0x%llx & 0x%llx\n",
+		vcpu->arch.hpte_cache_offset, guest_pa, pa_mask);
+#endif
+	BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
+
+	for (i=0; i<vcpu->arch.hpte_cache_offset; i++) {
+		struct hpte_cache *pte;
+
+		pte = &vcpu->arch.hpte_cache[i];
+		if (!pte->host_va)
+			continue;
+
+		if ((pte->pte.raddr >= pa_start) && (pte->pte.raddr < pa_end)) {
+#ifdef DEBUG_MMU
+	printk(KERN_INFO "KVM: Flushing SPT %d: 0x%llx (0x%llx) -> 0x%llx\n", i, pte->pte.eaddr, pte->pte.raddr, pte->host_va);
+#endif
+			ppc_md.hpte_invalidate(pte->slot, pte->host_va,
+					       MMU_PAGE_4K, MMU_SEGSIZE_256M,
+					       false);
+			pte->host_va = 0;
+			kvm_release_pfn_dirty(pte->pfn);
+		}
+	}
+}
+
+struct kvmppc_pte *kvmppc_mmu_find_pte(struct kvm_vcpu *vcpu, u64 ea, bool data)
+{
+	int i;
+	u64 guest_vp;
+
+	guest_vp = vcpu->arch.mmu.ea_to_vp(vcpu, ea, false);
+	for (i=0; i<vcpu->arch.hpte_cache_offset; i++) {
+		struct hpte_cache *pte;
+
+		pte = &vcpu->arch.hpte_cache[i];
+		if (!pte->host_va)
+			continue;
+
+		if (pte->pte.vpage = guest_vp)
+			return &pte->pte;
+	}
+
+	return NULL;
+}
+
+static int kvmppc_mmu_hpte_cache_next(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.hpte_cache_offset = HPTEG_CACHE_NUM)
+		kvmppc_mmu_pte_flush(vcpu, 0, 0);
+
+	return vcpu->arch.hpte_cache_offset++;
+}
+
+/* We keep 512 gvsid->hvsid entries, mapping the guest ones to the array using
+ * a hash, so we don't waste cycles on looping */
+static u16 kvmppc_sid_hash(struct kvm_vcpu *vcpu, u64 gvsid)
+{
+	return (u16)(((gvsid >> (SID_MAP_BITS * 7)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 6)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 5)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 4)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 3)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 2)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 1)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 0)) & SID_MAP_MASK));
+}
+
+
+static struct kvmppc_sid_map *find_sid_vsid(struct kvm_vcpu *vcpu, u64 gvsid)
+{
+	struct kvmppc_sid_map *map;
+
+	if (vcpu->arch.msr & MSR_PR)
+		gvsid |= VSID_PR;
+
+	map = &to_book3s(vcpu)->sid_map[kvmppc_sid_hash(vcpu, gvsid)];
+	if (map->guest_vsid = gvsid) {
+#ifdef DEBUG_SLB
+		printk(KERN_INFO "SLB: Searching 0x%llx -> 0x%llx\n", gvsid, map->host_vsid);
+#endif
+		return map;
+	}
+
+	map = &to_book3s(vcpu)->sid_map[SID_MAP_MASK - kvmppc_sid_hash(vcpu, gvsid)];
+	if (map->guest_vsid = gvsid) {
+#ifdef DEBUG_SLB
+		printk(KERN_INFO "SLB: Searching 0x%llx -> 0x%llx\n", gvsid, map->host_vsid);
+#endif
+		return map;
+	}
+
+#ifdef DEBUG_SLB
+	printk(KERN_INFO "SLB: Searching 0x%llx -> not found\n", gvsid);
+#endif
+	return NULL;
+}
+
+int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte)
+{
+	pfn_t hpaddr;
+	ulong hash, hpteg, va;
+	u64 vsid;
+	int ret;
+	int rflags = 0x192;
+	int vflags = 0;
+	int attempt = 0;
+	struct kvmppc_sid_map *map;
+
+	/* Get host physical address for gpa */
+	down_read(&current->mm->mmap_sem);
+	hpaddr = gfn_to_pfn(vcpu->kvm, orig_pte->raddr >> PAGE_SHIFT);
+	if (kvm_is_error_hva(hpaddr)) {
+		printk(KERN_INFO "Couldn't get guest page for gfn %llx!\n", orig_pte->eaddr);
+		up_read(&current->mm->mmap_sem);
+		return -EINVAL;
+	}
+	hpaddr <<= PAGE_SHIFT;
+#if PAGE_SHIFT = 12
+#elif PAGE_SHIFT = 16
+	hpaddr |= orig_pte->raddr & 0xf000;
+#else
+#error Unknown page size
+#endif
+
+	up_read(&current->mm->mmap_sem);
+
+	/* and write the mapping ea -> hpa into the pt */
+	vcpu->arch.mmu.esid_to_vsid(vcpu, orig_pte->eaddr >> SID_SHIFT, &vsid);
+	map = find_sid_vsid(vcpu, vsid);
+	if (!map) {
+		kvmppc_mmu_map_segment(vcpu, orig_pte->eaddr);
+		map = find_sid_vsid(vcpu, vsid);
+	}
+	BUG_ON(!map);
+
+	vsid = map->host_vsid;
+	va = hpt_va(orig_pte->eaddr, vsid, MMU_SEGSIZE_256M);
+
+	if (!orig_pte->may_write)
+		rflags |= HPTE_R_PP;
+	else
+		mark_page_dirty(vcpu->kvm, orig_pte->raddr >> PAGE_SHIFT);
+
+	if (!orig_pte->may_execute)
+		rflags |= HPTE_R_N;
+
+	hash = hpt_hash(va, PTE_SIZE, MMU_SEGSIZE_256M);
+
+map_again:
+	hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP);
+
+	/* In case we tried normal mapping already, let's nuke old entries */
+	if (attempt > 1)
+		if (ppc_md.hpte_remove(hpteg) < 0)
+			return -1;
+
+	ret = ppc_md.hpte_insert(hpteg, va, hpaddr, rflags, vflags, MMU_PAGE_4K, MMU_SEGSIZE_256M);
+
+	if (ret < 0) {
+		/* If we couldn't map a primary PTE, try a secondary */
+#ifdef USE_SECONDARY
+		hash = ~hash;
+		attempt++;
+		if (attempt % 2)
+			vflags = HPTE_V_SECONDARY;
+		else
+			vflags = 0;
+#else
+		attempt = 2;
+#endif
+		goto map_again;
+	} else {
+		int hpte_id = kvmppc_mmu_hpte_cache_next(vcpu);
+		struct hpte_cache *pte = &vcpu->arch.hpte_cache[hpte_id];
+#ifdef DEBUG_MMU
+		printk(KERN_INFO "KVM: %c%c Map 0x%llx: [%lx] 0x%lx (0x%llx) -> %lx\n",
+				 ((rflags & HPTE_R_PP) = 3) ? '-' : 'w',
+				 (rflags & HPTE_R_N) ? '-' : 'x',
+				 orig_pte->eaddr, hpteg, va, orig_pte->vpage,
+				 hpaddr);
+#endif
+		pte->slot = hpteg + (ret & 7);
+		pte->host_va = va;
+		pte->pte = *orig_pte;
+		pte->pfn = hpaddr >> PAGE_SHIFT;
+	}
+
+	return 0;
+}
+
+static struct kvmppc_sid_map *create_sid_map(struct kvm_vcpu *vcpu, u64 gvsid)
+{
+	struct kvmppc_sid_map *map;
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	static int backwards_map = 0;
+
+	if (vcpu->arch.msr & MSR_PR)
+		gvsid |= VSID_PR;
+
+	/* We might get collisions that trap in preceding order, so let's
+	   map them differently */
+	if (backwards_map)
+		map = &to_book3s(vcpu)->sid_map[SID_MAP_MASK - kvmppc_sid_hash(vcpu, gvsid)];
+	else
+		map = &to_book3s(vcpu)->sid_map[kvmppc_sid_hash(vcpu, gvsid)];
+	backwards_map = !backwards_map;
+
+	// Uh-oh ... out of mappings. Let's flush!
+	if (vcpu_book3s->vsid_next = vcpu_book3s->vsid_max) {
+		vcpu_book3s->vsid_next = vcpu_book3s->vsid_first;
+		memset(vcpu_book3s->sid_map, 0,
+		       sizeof(struct kvmppc_sid_map) * SID_MAP_NUM);
+		kvmppc_mmu_pte_flush(vcpu, 0, 0);
+		kvmppc_mmu_flush_segments(vcpu);
+	}
+	map->host_vsid = vcpu_book3s->vsid_next++;
+
+	map->guest_vsid = gvsid;
+	map->valid = true;
+
+	return map;
+}
+
+static int kvmppc_mmu_next_segment(struct kvm_vcpu *vcpu, ulong esid)
+{
+	int i;
+	int max_slb_size = 64;
+	int found_inval = -1;
+	int r;
+
+	if (!get_paca()->kvm_slb_max)
+		get_paca()->kvm_slb_max = 1;
+
+	/* Are we overwriting? */
+	for (i = 1; i < get_paca()->kvm_slb_max; i++) {
+		if (!(get_paca()->kvm_slb[i].esid & SLB_ESID_V))
+			found_inval = i;
+		else if ((get_paca()->kvm_slb[i].esid & ESID_MASK) = esid)
+			return i;
+	}
+
+	/* Found a spare entry that was invalidated before */
+	if (found_inval > 0)
+		return found_inval;
+
+	/* No spare invalid entry, so create one */
+
+	if (mmu_slb_size < 64)
+		max_slb_size = mmu_slb_size;
+
+	/* Overflowing -> purge */
+	if ((get_paca()->kvm_slb_max) = max_slb_size)
+		kvmppc_mmu_flush_segments(vcpu);
+
+	r = get_paca()->kvm_slb_max;
+	get_paca()->kvm_slb_max++;
+
+	return r;
+}
+
+int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr)
+{
+	u64 esid = eaddr >> SID_SHIFT;
+	u64 slb_esid = (eaddr & ESID_MASK) | SLB_ESID_V;
+	u64 slb_vsid = SLB_VSID_USER;
+	u64 gvsid;
+	int slb_index;
+	struct kvmppc_sid_map *map;
+
+	slb_index = kvmppc_mmu_next_segment(vcpu, eaddr & ESID_MASK);
+
+	if (vcpu->arch.mmu.esid_to_vsid(vcpu, esid, &gvsid)) {
+		/* Invalidate an entry */
+		get_paca()->kvm_slb[slb_index].esid = 0;
+		return -ENOENT;
+	}
+
+	map = find_sid_vsid(vcpu, gvsid);
+	if (!map)
+		map = create_sid_map(vcpu, gvsid);
+
+	map->guest_esid = esid;
+
+	slb_vsid |= (map->host_vsid << 12);
+	slb_vsid &= ~SLB_VSID_KP;
+	slb_esid |= slb_index;
+
+	get_paca()->kvm_slb[slb_index].esid = slb_esid;
+	get_paca()->kvm_slb[slb_index].vsid = slb_vsid;
+
+#ifdef DEBUG_SLB
+	printk(KERN_INFO "slbmte %#llx, %#llx\n", slb_vsid, slb_esid);
+#endif
+
+	return 0;
+}
+
+void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu)
+{
+	get_paca()->kvm_slb_max = 1;
+	get_paca()->kvm_slb[0].esid = 0;
+}
+
+void kvmppc_mmu_destroy(struct kvm_vcpu *vcpu)
+{
+	kvmppc_mmu_pte_flush(vcpu, 0, 0);
+}
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 12/27] Add book3s_64 guest MMU
  2009-10-21 15:03                               ` Alexander Graf
@ 2009-10-21 15:03                                 ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

To be able to run a guest, we also need to implement a guest MMU.

This patch adds MMU handling for Book3s_64 guests.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s_64_mmu.c |  469 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 469 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu.c

diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
new file mode 100644
index 0000000..be9c846
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -0,0 +1,469 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/highmem.h>
+
+#include <asm/tlbflush.h>
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+
+// #define DEBUG_MMU
+
+static void kvmppc_mmu_book3s_64_reset_msr(struct kvm_vcpu *vcpu)
+{
+	kvmppc_set_msr(vcpu, MSR_SF);
+}
+
+static struct kvmppc_slb *kvmppc_mmu_book3s_64_find_slbe(struct kvmppc_vcpu_book3s *vcpu_book3s,
+						   gva_t eaddr)
+{
+	int i;
+	u64 esid = GET_ESID(eaddr);
+	u64 esid_1t = GET_ESID_1T(eaddr);
+
+	for (i = 0; i < vcpu_book3s->slb_nr; i++) {
+		u64 cmp_esid = esid;
+
+		if (!vcpu_book3s->slb[i].valid)
+			continue;
+
+		if (vcpu_book3s->slb[i].large)
+			cmp_esid = esid_1t;
+
+		if (vcpu_book3s->slb[i].esid == cmp_esid)
+			return &vcpu_book3s->slb[i];
+	}
+
+#ifdef DEBUG_MMU
+	printk(KERN_ERR "KVM: No SLB entry found for 0x%lx [%llx | %llx]\n", eaddr, esid, esid_1t);
+	for (i = 0; i < vcpu_book3s->slb_nr; i++) {
+	    if (vcpu_book3s->slb[i].vsid)
+		printk(KERN_ERR "  %d: %c%c %llx %llx\n", i, vcpu_book3s->slb[i].valid ? 'v' : ' ',
+							vcpu_book3s->slb[i].large ? 'l' : ' ',
+							vcpu_book3s->slb[i].esid,
+							vcpu_book3s->slb[i].vsid);
+	}
+#endif
+
+	return NULL;
+}
+
+static u64 kvmppc_mmu_book3s_64_ea_to_vp(struct kvm_vcpu *vcpu, gva_t eaddr, bool data)
+{
+	struct kvmppc_slb *slb = kvmppc_mmu_book3s_64_find_slbe(to_book3s(vcpu), eaddr);
+
+	if (!slb)
+		return 0;
+
+	if (slb->large)
+		return (((u64)eaddr >> 12) & 0xfffffff) | (((u64)slb->vsid) << 28);
+
+	return (((u64)eaddr >> 12) & 0xffff) | (((u64)slb->vsid) << 16);
+}
+
+static int kvmppc_mmu_book3s_64_get_pagesize(struct kvmppc_slb *slbe)
+{
+	return slbe->large ? 24 : 12;
+}
+
+static u32 kvmppc_mmu_book3s_64_get_page(struct kvmppc_slb *slbe, gva_t eaddr)
+{
+	int p = kvmppc_mmu_book3s_64_get_pagesize(slbe);
+	return ((eaddr & 0xfffffff) >> p);
+}
+
+static hva_t kvmppc_mmu_book3s_64_get_pteg(struct kvmppc_vcpu_book3s *vcpu_book3s,
+				     struct kvmppc_slb *slbe, gva_t eaddr,
+				     bool second)
+{
+	u64 hash, pteg, htabsize;
+	u32 page;
+	hva_t r;
+
+	page = kvmppc_mmu_book3s_64_get_page(slbe, eaddr);
+	htabsize = ((1 << ((vcpu_book3s->sdr1 & 0x1f) + 11)) - 1);
+
+	hash = slbe->vsid ^ page;
+	if (second)
+		hash = ~hash;
+	hash &= ((1ULL << 39ULL) - 1ULL);
+	hash &= htabsize;
+	hash <<= 7ULL;
+
+	pteg = vcpu_book3s->sdr1 & 0xfffffffffffc0000ULL;
+	pteg |= hash;
+
+#ifdef DEBUG_MMU
+	printk(KERN_INFO "MMU: page=0x%x sdr1=0x%llx pteg=0x%llx vsid=0x%llx\n", page, vcpu_book3s->sdr1, pteg, slbe->vsid);
+#endif
+
+	r = gfn_to_hva(vcpu_book3s->vcpu.kvm, pteg >> PAGE_SHIFT);
+	if (kvm_is_error_hva(r))
+		return r;
+	return r | (pteg & ~PAGE_MASK);
+}
+
+static u64 kvmppc_mmu_book3s_64_get_avpn(struct kvmppc_slb *slbe, gva_t eaddr)
+{
+	int p = kvmppc_mmu_book3s_64_get_pagesize(slbe);
+	u64 avpn;
+
+	avpn = kvmppc_mmu_book3s_64_get_page(slbe, eaddr);
+	avpn |= slbe->vsid << (28 - p);
+
+	if (p < 24)
+		avpn >>= ((80 - p) - 56) - 8;
+	else
+		avpn <<= 8;
+
+	return avpn;
+}
+
+static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
+				struct kvmppc_pte *gpte, bool data)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_slb *slbe;
+	hva_t ptegp;
+	u64 pteg[16];
+	u64 avpn = 0;
+	int i;
+	u8 key = 0;
+	bool found = false;
+	bool perm_err = false;
+	int second = 0;
+
+	slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu_book3s, eaddr);
+	if (!slbe)
+		goto no_seg_found;
+
+do_second:
+	ptegp = kvmppc_mmu_book3s_64_get_pteg(vcpu_book3s, slbe, eaddr, second);
+	if (kvm_is_error_hva(ptegp))
+		goto no_page_found;
+
+	avpn = kvmppc_mmu_book3s_64_get_avpn(slbe, eaddr);
+
+	if(copy_from_user(pteg, (void __user *)ptegp, sizeof(pteg))) {
+		printk(KERN_ERR "KVM can't copy data from 0x%lx!\n", ptegp);
+		goto no_page_found;
+	}
+
+	if ((vcpu->arch.msr & MSR_PR) && slbe->Kp)
+		key = 4;
+	else if (!(vcpu->arch.msr & MSR_PR) && slbe->Ks)
+		key = 4;
+
+	for (i=0; i<16; i+=2) {
+		u64 v = pteg[i];
+		u64 r = pteg[i+1];
+
+		// Valid check
+		if (!(v & HPTE_V_VALID))
+			continue;
+		// Hash check
+		if ((v & HPTE_V_SECONDARY) != second)
+			continue;
+
+		// AVPN compare
+		if (HPTE_V_AVPN_VAL(avpn) == HPTE_V_AVPN_VAL(v)) {
+			u8 pp = (r & HPTE_R_PP) | key;
+			int eaddr_mask = 0xFFF;
+
+			gpte->eaddr = eaddr;
+			gpte->vpage = kvmppc_mmu_book3s_64_ea_to_vp(vcpu, eaddr, data);
+			if (slbe->large)
+				eaddr_mask = 0xFFFFFF;
+			gpte->raddr = (r & HPTE_R_RPN) | (eaddr & eaddr_mask);
+			gpte->may_execute = ((r & HPTE_R_N) ? false : true);
+			gpte->may_read = false;
+			gpte->may_write = false;
+
+			switch (pp) {
+			case 0:
+			case 1:
+			case 2:
+			case 6:
+				gpte->may_write = true;
+				/* fall through */
+			case 3:
+			case 5:
+			case 7:
+				gpte->may_read = true;
+				break;
+			}
+
+			if (!gpte->may_read) {
+				perm_err = true;
+				continue;
+			}
+#ifdef DEBUG_MMU
+			printk(KERN_ERR "KVM MMU: Translated 0x%lx [0x%llx] -> 0x%llx -> 0x%llx\n",
+					eaddr, avpn, gpte->vpage, gpte->raddr);
+#endif
+			found = true;
+			break;
+		}
+	}
+
+	// Update PTE R and C bits, so the guest's swapper knows we used the page
+	if (found) {
+		u32 oldr = pteg[i+1];
+
+		if (gpte->may_read)
+			pteg[i+1] |= HPTE_R_R; // Accessed flag
+		if (gpte->may_write)
+			pteg[i+1] |= HPTE_R_C; // Dirty flag
+#ifdef DEBUG_MMU_PTE
+		else
+			printk(KERN_INFO "KVM: Mapping read-only page!\n");
+#endif
+
+		// Write back into the PTEG
+		if (pteg[i+1] != oldr)
+			copy_to_user((void __user *)ptegp, pteg, sizeof(pteg));
+
+		return 0;
+	} else {
+#ifdef DEBUG_MMU
+		printk(KERN_ERR "KVM MMU: No PTE found (ea=0x%lx sdr1=0x%llx ptegp=0x%lx)\n",
+				eaddr, to_book3s(vcpu)->sdr1, ptegp);
+		for (i=0; i<16; i+=2)
+			printk(KERN_ERR "   %02d: 0x%llx - 0x%llx (0x%llx)\n",
+					i, pteg[i], pteg[i+1], avpn);
+#endif
+
+		if (!second) {
+			second = HPTE_V_SECONDARY;
+			goto do_second;
+		}
+	}
+
+
+no_page_found:
+
+
+	if (perm_err)
+		return -EPERM;
+
+	return -ENOENT;
+
+no_seg_found:
+
+#ifdef DEBUG_MMU
+	printk(KERN_ERR "KVM MMU: Trigger segment fault\n");
+#endif
+	return -EINVAL;
+}
+
+static void kvmppc_mmu_book3s_64_slbmte(struct kvm_vcpu *vcpu, u64 rs, u64 rb)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s;
+	u64 esid, esid_1t;
+	int slb_nr;
+	struct kvmppc_slb *slbe;
+
+#ifdef DEBUG_MMU
+	printk(KERN_ERR "KVM MMU: slbmte(0x%llx, 0x%llx)\n", rs, rb);
+#endif
+
+	vcpu_book3s = to_book3s(vcpu);
+
+	esid = GET_ESID(rb);
+	esid_1t = GET_ESID_1T(rb);
+	slb_nr = rb & 0xfff;
+
+	if (slb_nr > vcpu_book3s->slb_nr)
+		return;
+
+	slbe = &vcpu_book3s->slb[slb_nr];
+
+	slbe->large = (rs & SLB_VSID_L) ? 1 : 0;
+	slbe->esid  = slbe->large ? esid_1t : esid;
+	slbe->vsid  = rs >> 12;
+	slbe->valid = (rb & SLB_ESID_V) ? 1 : 0;
+	slbe->Ks    = (rs & SLB_VSID_KS) ? 1 : 0;
+	slbe->Kp    = (rs & SLB_VSID_KP) ? 1 : 0;
+	slbe->nx    = (rs & SLB_VSID_N) ? 1 : 0;
+	slbe->class = (rs & SLB_VSID_C) ? 1 : 0;
+
+	slbe->orige = rb & (ESID_MASK | SLB_ESID_V);
+	slbe->origv = rs;
+
+	/* Map the new segment */
+	kvmppc_mmu_map_segment(vcpu, esid << SID_SHIFT);
+}
+
+static u64 kvmppc_mmu_book3s_64_slbmfee(struct kvm_vcpu *vcpu, u64 slb_nr)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_slb *slbe;
+
+	if (slb_nr > vcpu_book3s->slb_nr)
+		return 0;
+
+	slbe = &vcpu_book3s->slb[slb_nr];
+
+	return slbe->orige;
+}
+
+static u64 kvmppc_mmu_book3s_64_slbmfev(struct kvm_vcpu *vcpu, u64 slb_nr)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_slb *slbe;
+
+	if (slb_nr > vcpu_book3s->slb_nr)
+		return 0;
+
+	slbe = &vcpu_book3s->slb[slb_nr];
+
+	return slbe->origv;
+}
+
+static void kvmppc_mmu_book3s_64_slbie(struct kvm_vcpu *vcpu, u64 ea)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_slb *slbe;
+
+#ifdef DEBUG_MMU
+	printk(KERN_ERR "KVM MMU: slbie(0x%llx)\n", ea);
+#endif
+
+	slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu_book3s, ea);
+
+	if (!slbe)
+		return;
+
+#ifdef DEBUG_MMU
+	printk(KERN_ERR "KVM MMU: slbie(0x%llx, 0x%llx)\n", ea, slbe->esid);
+#endif
+
+	slbe->valid = false;
+
+	kvmppc_mmu_map_segment(vcpu, ea);
+}
+
+static void kvmppc_mmu_book3s_64_slbia(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	int i;
+
+#ifdef DEBUG_MMU
+	printk(KERN_ERR "KVM MMU: slbia()\n");
+#endif
+	for (i = 1; i < vcpu_book3s->slb_nr; i++)
+		vcpu_book3s->slb[i].valid = false;
+
+	if (vcpu->arch.msr & MSR_IR) {
+		kvmppc_mmu_flush_segments(vcpu);
+		kvmppc_mmu_map_segment(vcpu, vcpu->arch.pc);
+	}
+}
+
+static void kvmppc_mmu_book3s_64_mtsrin(struct kvm_vcpu *vcpu, u32 srnum, ulong value)
+{
+	u64 rb = 0, rs = 0;
+
+	/* ESID = srnum */
+	rb |= (srnum & 0xf) << 28;
+	/* Set the valid bit */
+	rb |= 1 << 27;
+	/* Index = ESID */
+	rb |= srnum;
+
+	/* VSID = VSID */
+	rs |= (value & 0xfffffff) << 12;
+	/* flags = flags */
+	rs |= ((value >> 27) & 0xf) << 9;
+
+	kvmppc_mmu_book3s_64_slbmte(vcpu, rs, rb);
+}
+
+static void kvmppc_mmu_book3s_64_tlbie(struct kvm_vcpu *vcpu, ulong va, bool large)
+{
+	u64 mask = 0xFFFFFFFFFULL;
+#ifdef DEBUG_MMU
+	printk(KERN_ERR "KVM MMU: tlbie(0x%lx)\n", va);
+#endif
+	if (large)
+		mask = 0xFFFFFF000ULL;
+	kvmppc_mmu_pte_vflush(vcpu, va >> 12, mask);
+}
+
+static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, u64 esid,
+					     u64 *vsid)
+{
+	switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+	case 0:
+		*vsid = (VSID_REAL >> 16) | esid;
+		break;
+	case MSR_IR:
+		*vsid = (VSID_REAL_IR >> 16) | esid;
+		break;
+	case MSR_DR:
+		*vsid = (VSID_REAL_DR >> 16) | esid;
+		break;
+	case MSR_DR|MSR_IR:
+	{
+		ulong ea;
+		struct kvmppc_slb *slb;
+		ea = esid << SID_SHIFT;
+		slb = kvmppc_mmu_book3s_64_find_slbe(to_book3s(vcpu), ea);
+		if (slb)
+			*vsid = slb->vsid;
+		else
+			return -ENOENT;
+
+		break;
+	}
+	default:
+		BUG();
+		break;
+	}
+
+	return 0;
+}
+
+static bool kvmppc_mmu_book3s_64_is_dcbz32(struct kvm_vcpu *vcpu)
+{
+	return (to_book3s(vcpu)->hid[5] & 0x80);
+}
+
+void kvmppc_mmu_book3s_64_init(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_mmu *mmu = &vcpu->arch.mmu;
+
+	mmu->mfsrin = NULL;
+	mmu->mtsrin = kvmppc_mmu_book3s_64_mtsrin;
+	mmu->slbmte = kvmppc_mmu_book3s_64_slbmte;
+	mmu->slbmfee = kvmppc_mmu_book3s_64_slbmfee;
+	mmu->slbmfev = kvmppc_mmu_book3s_64_slbmfev;
+	mmu->slbie = kvmppc_mmu_book3s_64_slbie;
+	mmu->slbia = kvmppc_mmu_book3s_64_slbia;
+	mmu->xlate = kvmppc_mmu_book3s_64_xlate;
+	mmu->reset_msr = kvmppc_mmu_book3s_64_reset_msr;
+	mmu->tlbie = kvmppc_mmu_book3s_64_tlbie;
+	mmu->esid_to_vsid = kvmppc_mmu_book3s_64_esid_to_vsid;
+	mmu->ea_to_vp = kvmppc_mmu_book3s_64_ea_to_vp;
+	mmu->is_dcbz32 = kvmppc_mmu_book3s_64_is_dcbz32;
+}
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 12/27] Add book3s_64 guest MMU
@ 2009-10-21 15:03                                 ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

To be able to run a guest, we also need to implement a guest MMU.

This patch adds MMU handling for Book3s_64 guests.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s_64_mmu.c |  469 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 469 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu.c

diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
new file mode 100644
index 0000000..be9c846
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -0,0 +1,469 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/highmem.h>
+
+#include <asm/tlbflush.h>
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+
+// #define DEBUG_MMU
+
+static void kvmppc_mmu_book3s_64_reset_msr(struct kvm_vcpu *vcpu)
+{
+	kvmppc_set_msr(vcpu, MSR_SF);
+}
+
+static struct kvmppc_slb *kvmppc_mmu_book3s_64_find_slbe(struct kvmppc_vcpu_book3s *vcpu_book3s,
+						   gva_t eaddr)
+{
+	int i;
+	u64 esid = GET_ESID(eaddr);
+	u64 esid_1t = GET_ESID_1T(eaddr);
+
+	for (i = 0; i < vcpu_book3s->slb_nr; i++) {
+		u64 cmp_esid = esid;
+
+		if (!vcpu_book3s->slb[i].valid)
+			continue;
+
+		if (vcpu_book3s->slb[i].large)
+			cmp_esid = esid_1t;
+
+		if (vcpu_book3s->slb[i].esid = cmp_esid)
+			return &vcpu_book3s->slb[i];
+	}
+
+#ifdef DEBUG_MMU
+	printk(KERN_ERR "KVM: No SLB entry found for 0x%lx [%llx | %llx]\n", eaddr, esid, esid_1t);
+	for (i = 0; i < vcpu_book3s->slb_nr; i++) {
+	    if (vcpu_book3s->slb[i].vsid)
+		printk(KERN_ERR "  %d: %c%c %llx %llx\n", i, vcpu_book3s->slb[i].valid ? 'v' : ' ',
+							vcpu_book3s->slb[i].large ? 'l' : ' ',
+							vcpu_book3s->slb[i].esid,
+							vcpu_book3s->slb[i].vsid);
+	}
+#endif
+
+	return NULL;
+}
+
+static u64 kvmppc_mmu_book3s_64_ea_to_vp(struct kvm_vcpu *vcpu, gva_t eaddr, bool data)
+{
+	struct kvmppc_slb *slb = kvmppc_mmu_book3s_64_find_slbe(to_book3s(vcpu), eaddr);
+
+	if (!slb)
+		return 0;
+
+	if (slb->large)
+		return (((u64)eaddr >> 12) & 0xfffffff) | (((u64)slb->vsid) << 28);
+
+	return (((u64)eaddr >> 12) & 0xffff) | (((u64)slb->vsid) << 16);
+}
+
+static int kvmppc_mmu_book3s_64_get_pagesize(struct kvmppc_slb *slbe)
+{
+	return slbe->large ? 24 : 12;
+}
+
+static u32 kvmppc_mmu_book3s_64_get_page(struct kvmppc_slb *slbe, gva_t eaddr)
+{
+	int p = kvmppc_mmu_book3s_64_get_pagesize(slbe);
+	return ((eaddr & 0xfffffff) >> p);
+}
+
+static hva_t kvmppc_mmu_book3s_64_get_pteg(struct kvmppc_vcpu_book3s *vcpu_book3s,
+				     struct kvmppc_slb *slbe, gva_t eaddr,
+				     bool second)
+{
+	u64 hash, pteg, htabsize;
+	u32 page;
+	hva_t r;
+
+	page = kvmppc_mmu_book3s_64_get_page(slbe, eaddr);
+	htabsize = ((1 << ((vcpu_book3s->sdr1 & 0x1f) + 11)) - 1);
+
+	hash = slbe->vsid ^ page;
+	if (second)
+		hash = ~hash;
+	hash &= ((1ULL << 39ULL) - 1ULL);
+	hash &= htabsize;
+	hash <<= 7ULL;
+
+	pteg = vcpu_book3s->sdr1 & 0xfffffffffffc0000ULL;
+	pteg |= hash;
+
+#ifdef DEBUG_MMU
+	printk(KERN_INFO "MMU: page=0x%x sdr1=0x%llx pteg=0x%llx vsid=0x%llx\n", page, vcpu_book3s->sdr1, pteg, slbe->vsid);
+#endif
+
+	r = gfn_to_hva(vcpu_book3s->vcpu.kvm, pteg >> PAGE_SHIFT);
+	if (kvm_is_error_hva(r))
+		return r;
+	return r | (pteg & ~PAGE_MASK);
+}
+
+static u64 kvmppc_mmu_book3s_64_get_avpn(struct kvmppc_slb *slbe, gva_t eaddr)
+{
+	int p = kvmppc_mmu_book3s_64_get_pagesize(slbe);
+	u64 avpn;
+
+	avpn = kvmppc_mmu_book3s_64_get_page(slbe, eaddr);
+	avpn |= slbe->vsid << (28 - p);
+
+	if (p < 24)
+		avpn >>= ((80 - p) - 56) - 8;
+	else
+		avpn <<= 8;
+
+	return avpn;
+}
+
+static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
+				struct kvmppc_pte *gpte, bool data)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_slb *slbe;
+	hva_t ptegp;
+	u64 pteg[16];
+	u64 avpn = 0;
+	int i;
+	u8 key = 0;
+	bool found = false;
+	bool perm_err = false;
+	int second = 0;
+
+	slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu_book3s, eaddr);
+	if (!slbe)
+		goto no_seg_found;
+
+do_second:
+	ptegp = kvmppc_mmu_book3s_64_get_pteg(vcpu_book3s, slbe, eaddr, second);
+	if (kvm_is_error_hva(ptegp))
+		goto no_page_found;
+
+	avpn = kvmppc_mmu_book3s_64_get_avpn(slbe, eaddr);
+
+	if(copy_from_user(pteg, (void __user *)ptegp, sizeof(pteg))) {
+		printk(KERN_ERR "KVM can't copy data from 0x%lx!\n", ptegp);
+		goto no_page_found;
+	}
+
+	if ((vcpu->arch.msr & MSR_PR) && slbe->Kp)
+		key = 4;
+	else if (!(vcpu->arch.msr & MSR_PR) && slbe->Ks)
+		key = 4;
+
+	for (i=0; i<16; i+=2) {
+		u64 v = pteg[i];
+		u64 r = pteg[i+1];
+
+		// Valid check
+		if (!(v & HPTE_V_VALID))
+			continue;
+		// Hash check
+		if ((v & HPTE_V_SECONDARY) != second)
+			continue;
+
+		// AVPN compare
+		if (HPTE_V_AVPN_VAL(avpn) = HPTE_V_AVPN_VAL(v)) {
+			u8 pp = (r & HPTE_R_PP) | key;
+			int eaddr_mask = 0xFFF;
+
+			gpte->eaddr = eaddr;
+			gpte->vpage = kvmppc_mmu_book3s_64_ea_to_vp(vcpu, eaddr, data);
+			if (slbe->large)
+				eaddr_mask = 0xFFFFFF;
+			gpte->raddr = (r & HPTE_R_RPN) | (eaddr & eaddr_mask);
+			gpte->may_execute = ((r & HPTE_R_N) ? false : true);
+			gpte->may_read = false;
+			gpte->may_write = false;
+
+			switch (pp) {
+			case 0:
+			case 1:
+			case 2:
+			case 6:
+				gpte->may_write = true;
+				/* fall through */
+			case 3:
+			case 5:
+			case 7:
+				gpte->may_read = true;
+				break;
+			}
+
+			if (!gpte->may_read) {
+				perm_err = true;
+				continue;
+			}
+#ifdef DEBUG_MMU
+			printk(KERN_ERR "KVM MMU: Translated 0x%lx [0x%llx] -> 0x%llx -> 0x%llx\n",
+					eaddr, avpn, gpte->vpage, gpte->raddr);
+#endif
+			found = true;
+			break;
+		}
+	}
+
+	// Update PTE R and C bits, so the guest's swapper knows we used the page
+	if (found) {
+		u32 oldr = pteg[i+1];
+
+		if (gpte->may_read)
+			pteg[i+1] |= HPTE_R_R; // Accessed flag
+		if (gpte->may_write)
+			pteg[i+1] |= HPTE_R_C; // Dirty flag
+#ifdef DEBUG_MMU_PTE
+		else
+			printk(KERN_INFO "KVM: Mapping read-only page!\n");
+#endif
+
+		// Write back into the PTEG
+		if (pteg[i+1] != oldr)
+			copy_to_user((void __user *)ptegp, pteg, sizeof(pteg));
+
+		return 0;
+	} else {
+#ifdef DEBUG_MMU
+		printk(KERN_ERR "KVM MMU: No PTE found (ea=0x%lx sdr1=0x%llx ptegp=0x%lx)\n",
+				eaddr, to_book3s(vcpu)->sdr1, ptegp);
+		for (i=0; i<16; i+=2)
+			printk(KERN_ERR "   %02d: 0x%llx - 0x%llx (0x%llx)\n",
+					i, pteg[i], pteg[i+1], avpn);
+#endif
+
+		if (!second) {
+			second = HPTE_V_SECONDARY;
+			goto do_second;
+		}
+	}
+
+
+no_page_found:
+
+
+	if (perm_err)
+		return -EPERM;
+
+	return -ENOENT;
+
+no_seg_found:
+
+#ifdef DEBUG_MMU
+	printk(KERN_ERR "KVM MMU: Trigger segment fault\n");
+#endif
+	return -EINVAL;
+}
+
+static void kvmppc_mmu_book3s_64_slbmte(struct kvm_vcpu *vcpu, u64 rs, u64 rb)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s;
+	u64 esid, esid_1t;
+	int slb_nr;
+	struct kvmppc_slb *slbe;
+
+#ifdef DEBUG_MMU
+	printk(KERN_ERR "KVM MMU: slbmte(0x%llx, 0x%llx)\n", rs, rb);
+#endif
+
+	vcpu_book3s = to_book3s(vcpu);
+
+	esid = GET_ESID(rb);
+	esid_1t = GET_ESID_1T(rb);
+	slb_nr = rb & 0xfff;
+
+	if (slb_nr > vcpu_book3s->slb_nr)
+		return;
+
+	slbe = &vcpu_book3s->slb[slb_nr];
+
+	slbe->large = (rs & SLB_VSID_L) ? 1 : 0;
+	slbe->esid  = slbe->large ? esid_1t : esid;
+	slbe->vsid  = rs >> 12;
+	slbe->valid = (rb & SLB_ESID_V) ? 1 : 0;
+	slbe->Ks    = (rs & SLB_VSID_KS) ? 1 : 0;
+	slbe->Kp    = (rs & SLB_VSID_KP) ? 1 : 0;
+	slbe->nx    = (rs & SLB_VSID_N) ? 1 : 0;
+	slbe->class = (rs & SLB_VSID_C) ? 1 : 0;
+
+	slbe->orige = rb & (ESID_MASK | SLB_ESID_V);
+	slbe->origv = rs;
+
+	/* Map the new segment */
+	kvmppc_mmu_map_segment(vcpu, esid << SID_SHIFT);
+}
+
+static u64 kvmppc_mmu_book3s_64_slbmfee(struct kvm_vcpu *vcpu, u64 slb_nr)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_slb *slbe;
+
+	if (slb_nr > vcpu_book3s->slb_nr)
+		return 0;
+
+	slbe = &vcpu_book3s->slb[slb_nr];
+
+	return slbe->orige;
+}
+
+static u64 kvmppc_mmu_book3s_64_slbmfev(struct kvm_vcpu *vcpu, u64 slb_nr)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_slb *slbe;
+
+	if (slb_nr > vcpu_book3s->slb_nr)
+		return 0;
+
+	slbe = &vcpu_book3s->slb[slb_nr];
+
+	return slbe->origv;
+}
+
+static void kvmppc_mmu_book3s_64_slbie(struct kvm_vcpu *vcpu, u64 ea)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_slb *slbe;
+
+#ifdef DEBUG_MMU
+	printk(KERN_ERR "KVM MMU: slbie(0x%llx)\n", ea);
+#endif
+
+	slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu_book3s, ea);
+
+	if (!slbe)
+		return;
+
+#ifdef DEBUG_MMU
+	printk(KERN_ERR "KVM MMU: slbie(0x%llx, 0x%llx)\n", ea, slbe->esid);
+#endif
+
+	slbe->valid = false;
+
+	kvmppc_mmu_map_segment(vcpu, ea);
+}
+
+static void kvmppc_mmu_book3s_64_slbia(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	int i;
+
+#ifdef DEBUG_MMU
+	printk(KERN_ERR "KVM MMU: slbia()\n");
+#endif
+	for (i = 1; i < vcpu_book3s->slb_nr; i++)
+		vcpu_book3s->slb[i].valid = false;
+
+	if (vcpu->arch.msr & MSR_IR) {
+		kvmppc_mmu_flush_segments(vcpu);
+		kvmppc_mmu_map_segment(vcpu, vcpu->arch.pc);
+	}
+}
+
+static void kvmppc_mmu_book3s_64_mtsrin(struct kvm_vcpu *vcpu, u32 srnum, ulong value)
+{
+	u64 rb = 0, rs = 0;
+
+	/* ESID = srnum */
+	rb |= (srnum & 0xf) << 28;
+	/* Set the valid bit */
+	rb |= 1 << 27;
+	/* Index = ESID */
+	rb |= srnum;
+
+	/* VSID = VSID */
+	rs |= (value & 0xfffffff) << 12;
+	/* flags = flags */
+	rs |= ((value >> 27) & 0xf) << 9;
+
+	kvmppc_mmu_book3s_64_slbmte(vcpu, rs, rb);
+}
+
+static void kvmppc_mmu_book3s_64_tlbie(struct kvm_vcpu *vcpu, ulong va, bool large)
+{
+	u64 mask = 0xFFFFFFFFFULL;
+#ifdef DEBUG_MMU
+	printk(KERN_ERR "KVM MMU: tlbie(0x%lx)\n", va);
+#endif
+	if (large)
+		mask = 0xFFFFFF000ULL;
+	kvmppc_mmu_pte_vflush(vcpu, va >> 12, mask);
+}
+
+static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, u64 esid,
+					     u64 *vsid)
+{
+	switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+	case 0:
+		*vsid = (VSID_REAL >> 16) | esid;
+		break;
+	case MSR_IR:
+		*vsid = (VSID_REAL_IR >> 16) | esid;
+		break;
+	case MSR_DR:
+		*vsid = (VSID_REAL_DR >> 16) | esid;
+		break;
+	case MSR_DR|MSR_IR:
+	{
+		ulong ea;
+		struct kvmppc_slb *slb;
+		ea = esid << SID_SHIFT;
+		slb = kvmppc_mmu_book3s_64_find_slbe(to_book3s(vcpu), ea);
+		if (slb)
+			*vsid = slb->vsid;
+		else
+			return -ENOENT;
+
+		break;
+	}
+	default:
+		BUG();
+		break;
+	}
+
+	return 0;
+}
+
+static bool kvmppc_mmu_book3s_64_is_dcbz32(struct kvm_vcpu *vcpu)
+{
+	return (to_book3s(vcpu)->hid[5] & 0x80);
+}
+
+void kvmppc_mmu_book3s_64_init(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_mmu *mmu = &vcpu->arch.mmu;
+
+	mmu->mfsrin = NULL;
+	mmu->mtsrin = kvmppc_mmu_book3s_64_mtsrin;
+	mmu->slbmte = kvmppc_mmu_book3s_64_slbmte;
+	mmu->slbmfee = kvmppc_mmu_book3s_64_slbmfee;
+	mmu->slbmfev = kvmppc_mmu_book3s_64_slbmfev;
+	mmu->slbie = kvmppc_mmu_book3s_64_slbie;
+	mmu->slbia = kvmppc_mmu_book3s_64_slbia;
+	mmu->xlate = kvmppc_mmu_book3s_64_xlate;
+	mmu->reset_msr = kvmppc_mmu_book3s_64_reset_msr;
+	mmu->tlbie = kvmppc_mmu_book3s_64_tlbie;
+	mmu->esid_to_vsid = kvmppc_mmu_book3s_64_esid_to_vsid;
+	mmu->ea_to_vp = kvmppc_mmu_book3s_64_ea_to_vp;
+	mmu->is_dcbz32 = kvmppc_mmu_book3s_64_is_dcbz32;
+}
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 13/27] Add book3s_32 guest MMU
       [not found]                                 ` <1256137413-15256-13-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-21 15:03                                     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

This patch adds an implementation for a G3/G4 MMU, so we can run G3 and
G4 guests in KVM on Book3s_64.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
---
 arch/powerpc/kvm/book3s_32_mmu.c |  354 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 354 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_32_mmu.c

diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
new file mode 100644
index 0000000..134c186
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -0,0 +1,354 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
+ */
+
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/highmem.h>
+
+#include <asm/tlbflush.h>
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+
+// #define DEBUG_MMU
+// #define DEBUG_MMU_PTE
+// #define DEBUG_MMU_PTE_IP 0xfff14c40
+
+static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *pte, bool data);
+
+static struct kvmppc_sr *kvmppc_mmu_book3s_32_find_sr(
+		struct kvmppc_vcpu_book3s *vcpu_book3s, gva_t eaddr)
+{
+	return &vcpu_book3s->sr[(eaddr >> 28) & 0xf];
+}
+
+static u64 kvmppc_mmu_book3s_32_ea_to_vp(struct kvm_vcpu *vcpu, gva_t eaddr, bool data)
+{
+	struct kvmppc_sr *sre = kvmppc_mmu_book3s_32_find_sr(to_book3s(vcpu), eaddr);
+	struct kvmppc_pte pte;
+
+	if (!kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, &pte, data))
+		return pte.vpage;
+
+	return (((u64)eaddr >> 12) & 0xffff) | (((u64)sre->vsid) << 16);
+}
+
+static void kvmppc_mmu_book3s_32_reset_msr(struct kvm_vcpu *vcpu)
+{
+	kvmppc_set_msr(vcpu, 0);
+}
+
+static hva_t kvmppc_mmu_book3s_32_get_pteg(struct kvmppc_vcpu_book3s *vcpu_book3s,
+				      struct kvmppc_sr *sre, gva_t eaddr,
+				      bool primary)
+{
+	u32 page, hash, pteg, htabmask;
+	hva_t r;
+
+	page = (eaddr & 0x0FFFFFFF) >> 12;
+	htabmask = ((vcpu_book3s->sdr1 & 0x1FF) << 16) | 0xFFC0;
+
+	hash = ((sre->vsid ^ page) << 6);
+	if (!primary)
+		hash = ~hash;
+	hash &= htabmask;
+
+	pteg = (vcpu_book3s->sdr1 & 0xffff0000) | hash;
+
+#ifdef DEBUG_MMU
+	printk(KERN_INFO "MMU: pc=0x%lx eaddr=0x%lx sdr1=0x%llx pteg=0x%x vsid=0x%x\n",
+			vcpu_book3s->vcpu.arch.pc, eaddr, vcpu_book3s->sdr1, pteg, sre->vsid);
+#endif
+
+	r = gfn_to_hva(vcpu_book3s->vcpu.kvm, pteg >> PAGE_SHIFT);
+	if (kvm_is_error_hva(r))
+		return r;
+	return r | (pteg & ~PAGE_MASK);
+}
+
+static u32 kvmppc_mmu_book3s_32_get_ptem(struct kvmppc_sr *sre, gva_t eaddr,
+				    bool primary)
+{
+	return ((eaddr & 0x0fffffff) >> 22) | (sre->vsid << 7) |
+	       (primary ? 0 : 0x40) | 0x80000000;
+}
+
+static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *pte, bool data)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_bat *bat;
+	int i;
+
+	for (i = 0; i < 8; i++) {
+		if (data)
+			bat = &vcpu_book3s->dbat[i];
+		else
+			bat = &vcpu_book3s->ibat[i];
+
+		if (vcpu->arch.msr & MSR_PR) {
+			if (!bat->vp)
+				continue;
+		} else {
+			if (!bat->vs)
+				continue;
+		}
+
+#ifdef DEBUG_MMU_PTE
+#ifdef DEBUG_MMU_PTE_IP
+	if (vcpu->arch.pc == DEBUG_MMU_PTE_IP)
+#endif
+	{
+	printk(KERN_INFO "%cBAT %02d: 0x%lx - 0x%x (0x%x)\n", data ? 'd' : 'i', i, eaddr, bat->bepi, bat->bepi_mask);
+	}
+#endif
+		if ((eaddr & bat->bepi_mask) == bat->bepi) {
+			pte->raddr = bat->brpn | (eaddr & ~bat->bepi_mask);
+			pte->vpage = (eaddr >> 12) | VSID_BAT;
+			pte->may_read = bat->pp;
+			pte->may_write = bat->pp > 1;
+			pte->may_execute = true;
+			if (!pte->may_read) {
+				printk(KERN_INFO "BAT is not readable!\n");
+				continue;
+			}
+			if (!pte->may_write) {
+				// let's treat r/o BATs as not-readable for now
+#ifdef DEBUG_MMU_PTE
+				printk(KERN_INFO "BAT is read-only!\n");
+#endif
+				continue;
+			}
+
+			return 0;
+		}
+	}
+
+	return -ENOENT;
+}
+
+static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr,
+				     struct kvmppc_pte *pte, bool data,
+				     bool primary)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_sr *sre;
+	hva_t ptegp;
+	u32 pteg[16];
+	u64 ptem = 0;
+	int i;
+	int found = 0;
+
+	sre = kvmppc_mmu_book3s_32_find_sr(vcpu_book3s, eaddr);
+
+#ifdef DEBUG_MMU_PTE
+	printk(KERN_INFO "SR 0x%lx: vsid=0x%x, raw=0x%x\n", eaddr >> 28,
+			sre->vsid, sre->raw);
+#endif
+
+	pte->vpage = kvmppc_mmu_book3s_32_ea_to_vp(vcpu, eaddr, data);
+
+	ptegp = kvmppc_mmu_book3s_32_get_pteg(vcpu_book3s, sre, eaddr, primary);
+	if (kvm_is_error_hva(ptegp)) {
+		printk(KERN_INFO "KVM: Invalid PTEG!\n");
+		goto no_page_found;
+	}
+
+	ptem = kvmppc_mmu_book3s_32_get_ptem(sre, eaddr, primary);
+
+	if(copy_from_user(pteg, (void __user *)ptegp, sizeof(pteg))) {
+		printk(KERN_ERR "KVM: Can't copy data from 0x%lx!\n", ptegp);
+		goto no_page_found;
+	}
+
+	for (i=0; i<16; i+=2) {
+		// match?
+		if (ptem == pteg[i]) {
+			u8 pp;
+
+			pte->raddr = (pteg[i+1] & ~(0xFFFULL)) | (eaddr & 0xFFF);
+			pp = pteg[i+1] & 3;
+
+			if ((sre->Kp &&  (vcpu->arch.msr & MSR_PR)) ||
+			    (sre->Ks && !(vcpu->arch.msr & MSR_PR)))
+				pp |= 4;
+
+			pte->may_write = false;
+			pte->may_read = false;
+			pte->may_execute = true;
+			switch (pp) {
+				case 0:
+				case 1:
+				case 2:
+				case 6:
+					pte->may_write = true;
+				case 3:
+				case 5:
+				case 7:
+					pte->may_read = true;
+					break;
+			}
+
+			if ( !pte->may_read )
+				continue;
+#ifdef DEBUG_MMU_PTE
+			printk("MMU: Found PTE -> %x %x - %x\n", pteg[i], pteg[i+1], pp);
+#endif
+			found = 1;
+			break;
+		}
+	}
+
+	// Update PTE C and A bits, so the guest's swapper knows we used the page
+	if (found) {
+		u32 oldpte = pteg[i+1];
+
+		if (pte->may_read)
+			pteg[i+1] |= 0x00000100; // Accessed flag
+		if (pte->may_write)
+			pteg[i+1] |= 0x00000080; // Dirty flag
+#ifdef DEBUG_MMU_PTE
+		else
+		printk(KERN_INFO "KVM: Mapping read-only page!\n");
+#endif
+
+		// Write back into the PTEG
+		if (pteg[i+1] != oldpte)
+			copy_to_user((void __user *)ptegp, pteg, sizeof(pteg));
+
+		return 0;
+	}
+
+no_page_found:
+
+#ifdef DEBUG_MMU_PTE
+#ifdef DEBUG_MMU_PTE_IP
+	if (vcpu->arch.pc == DEBUG_MMU_PTE_IP)
+#endif
+	{
+	printk(KERN_INFO "KVM MMU: No PTE found (sdr1=0x%llx ptegp=0x%lx)\n", to_book3s(vcpu)->sdr1, ptegp);
+	for (i=0; i<16; i+=2) {
+		printk(KERN_INFO "   %02d: 0x%x - 0x%x (0x%llx)\n", i, pteg[i], pteg[i+1], ptem);
+	}
+	}
+#endif
+
+	return -ENOENT;
+}
+
+static int kvmppc_mmu_book3s_32_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *pte, bool data)
+{
+	int r;
+
+	pte->eaddr = eaddr;
+	r = kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, pte, data);
+	if (r < 0)
+	       r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte, data, true);
+	if (r < 0)
+	       r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte, data, false);
+
+	return r;
+}
+
+
+static u32 kvmppc_mmu_book3s_32_mfsrin(struct kvm_vcpu *vcpu, u32 srnum)
+{
+	return to_book3s(vcpu)->sr[srnum].raw;
+}
+
+static void kvmppc_mmu_book3s_32_mtsrin(struct kvm_vcpu *vcpu, u32 srnum, ulong value)
+{
+	struct kvmppc_sr *sre;
+
+	sre = &to_book3s(vcpu)->sr[srnum];
+
+	/* Flush any left-over shadows from the previous SR */
+//	kvmppc_mmu_pte_flush(vcpu, ((u64)sre->vsid) << 28, 0xf0000000ULL);
+
+	/* And then put in the new SR */
+	sre->raw = value;
+	sre->vsid = (value & 0x0fffffff);
+	sre->Ks = (value & 0x40000000) ? true : false;
+	sre->Kp = (value & 0x20000000) ? true : false;
+	sre->nx = (value & 0x10000000) ? true : false;
+
+	/* Map the new segment */
+	kvmppc_mmu_map_segment(vcpu, srnum << SID_SHIFT);
+}
+
+static void kvmppc_mmu_book3s_32_tlbie(struct kvm_vcpu *vcpu, ulong ea, bool large)
+{
+	kvmppc_mmu_pte_flush(vcpu, ea, ~0xFFFULL);
+}
+
+static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, u64 esid,
+					     u64 *vsid)
+{
+	/* In case we only have one of MSR_IR or MSR_DR set, let's put
+	   that in the real-mode context (and hope RM doesn't access
+	   high memory) */
+	switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+	case 0:
+		*vsid = (VSID_REAL >> 16) | esid;
+		break;
+	case MSR_IR:
+		*vsid = (VSID_REAL_IR >> 16) | esid;
+		break;
+	case MSR_DR:
+		*vsid = (VSID_REAL_DR >> 16) | esid;
+		break;
+	case MSR_DR|MSR_IR:
+	{
+		ulong ea;
+		ea = esid << SID_SHIFT;
+		*vsid = kvmppc_mmu_book3s_32_find_sr(to_book3s(vcpu), ea)->vsid;
+		break;
+	}
+	default:
+		BUG();
+	}
+
+	return 0;
+}
+
+static bool kvmppc_mmu_book3s_32_is_dcbz32(struct kvm_vcpu *vcpu)
+{
+	return true;
+}
+
+
+void kvmppc_mmu_book3s_32_init(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_mmu *mmu = &vcpu->arch.mmu;
+
+	mmu->mtsrin = kvmppc_mmu_book3s_32_mtsrin;
+	mmu->mfsrin = kvmppc_mmu_book3s_32_mfsrin;
+	mmu->xlate = kvmppc_mmu_book3s_32_xlate;
+	mmu->reset_msr = kvmppc_mmu_book3s_32_reset_msr;
+	mmu->tlbie = kvmppc_mmu_book3s_32_tlbie;
+	mmu->esid_to_vsid = kvmppc_mmu_book3s_32_esid_to_vsid;
+	mmu->ea_to_vp = kvmppc_mmu_book3s_32_ea_to_vp;
+	mmu->is_dcbz32 = kvmppc_mmu_book3s_32_is_dcbz32;
+
+	mmu->slbmte = NULL;
+	mmu->slbmfee = NULL;
+	mmu->slbmfev = NULL;
+	mmu->slbie = NULL;
+	mmu->slbia = NULL;
+}
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 13/27] Add book3s_32 guest MMU
@ 2009-10-21 15:03                                     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

This patch adds an implementation for a G3/G4 MMU, so we can run G3 and
G4 guests in KVM on Book3s_64.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s_32_mmu.c |  354 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 354 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_32_mmu.c

diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
new file mode 100644
index 0000000..134c186
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -0,0 +1,354 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/highmem.h>
+
+#include <asm/tlbflush.h>
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+
+// #define DEBUG_MMU
+// #define DEBUG_MMU_PTE
+// #define DEBUG_MMU_PTE_IP 0xfff14c40
+
+static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *pte, bool data);
+
+static struct kvmppc_sr *kvmppc_mmu_book3s_32_find_sr(
+		struct kvmppc_vcpu_book3s *vcpu_book3s, gva_t eaddr)
+{
+	return &vcpu_book3s->sr[(eaddr >> 28) & 0xf];
+}
+
+static u64 kvmppc_mmu_book3s_32_ea_to_vp(struct kvm_vcpu *vcpu, gva_t eaddr, bool data)
+{
+	struct kvmppc_sr *sre = kvmppc_mmu_book3s_32_find_sr(to_book3s(vcpu), eaddr);
+	struct kvmppc_pte pte;
+
+	if (!kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, &pte, data))
+		return pte.vpage;
+
+	return (((u64)eaddr >> 12) & 0xffff) | (((u64)sre->vsid) << 16);
+}
+
+static void kvmppc_mmu_book3s_32_reset_msr(struct kvm_vcpu *vcpu)
+{
+	kvmppc_set_msr(vcpu, 0);
+}
+
+static hva_t kvmppc_mmu_book3s_32_get_pteg(struct kvmppc_vcpu_book3s *vcpu_book3s,
+				      struct kvmppc_sr *sre, gva_t eaddr,
+				      bool primary)
+{
+	u32 page, hash, pteg, htabmask;
+	hva_t r;
+
+	page = (eaddr & 0x0FFFFFFF) >> 12;
+	htabmask = ((vcpu_book3s->sdr1 & 0x1FF) << 16) | 0xFFC0;
+
+	hash = ((sre->vsid ^ page) << 6);
+	if (!primary)
+		hash = ~hash;
+	hash &= htabmask;
+
+	pteg = (vcpu_book3s->sdr1 & 0xffff0000) | hash;
+
+#ifdef DEBUG_MMU
+	printk(KERN_INFO "MMU: pc=0x%lx eaddr=0x%lx sdr1=0x%llx pteg=0x%x vsid=0x%x\n",
+			vcpu_book3s->vcpu.arch.pc, eaddr, vcpu_book3s->sdr1, pteg, sre->vsid);
+#endif
+
+	r = gfn_to_hva(vcpu_book3s->vcpu.kvm, pteg >> PAGE_SHIFT);
+	if (kvm_is_error_hva(r))
+		return r;
+	return r | (pteg & ~PAGE_MASK);
+}
+
+static u32 kvmppc_mmu_book3s_32_get_ptem(struct kvmppc_sr *sre, gva_t eaddr,
+				    bool primary)
+{
+	return ((eaddr & 0x0fffffff) >> 22) | (sre->vsid << 7) |
+	       (primary ? 0 : 0x40) | 0x80000000;
+}
+
+static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *pte, bool data)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_bat *bat;
+	int i;
+
+	for (i = 0; i < 8; i++) {
+		if (data)
+			bat = &vcpu_book3s->dbat[i];
+		else
+			bat = &vcpu_book3s->ibat[i];
+
+		if (vcpu->arch.msr & MSR_PR) {
+			if (!bat->vp)
+				continue;
+		} else {
+			if (!bat->vs)
+				continue;
+		}
+
+#ifdef DEBUG_MMU_PTE
+#ifdef DEBUG_MMU_PTE_IP
+	if (vcpu->arch.pc = DEBUG_MMU_PTE_IP)
+#endif
+	{
+	printk(KERN_INFO "%cBAT %02d: 0x%lx - 0x%x (0x%x)\n", data ? 'd' : 'i', i, eaddr, bat->bepi, bat->bepi_mask);
+	}
+#endif
+		if ((eaddr & bat->bepi_mask) = bat->bepi) {
+			pte->raddr = bat->brpn | (eaddr & ~bat->bepi_mask);
+			pte->vpage = (eaddr >> 12) | VSID_BAT;
+			pte->may_read = bat->pp;
+			pte->may_write = bat->pp > 1;
+			pte->may_execute = true;
+			if (!pte->may_read) {
+				printk(KERN_INFO "BAT is not readable!\n");
+				continue;
+			}
+			if (!pte->may_write) {
+				// let's treat r/o BATs as not-readable for now
+#ifdef DEBUG_MMU_PTE
+				printk(KERN_INFO "BAT is read-only!\n");
+#endif
+				continue;
+			}
+
+			return 0;
+		}
+	}
+
+	return -ENOENT;
+}
+
+static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr,
+				     struct kvmppc_pte *pte, bool data,
+				     bool primary)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_sr *sre;
+	hva_t ptegp;
+	u32 pteg[16];
+	u64 ptem = 0;
+	int i;
+	int found = 0;
+
+	sre = kvmppc_mmu_book3s_32_find_sr(vcpu_book3s, eaddr);
+
+#ifdef DEBUG_MMU_PTE
+	printk(KERN_INFO "SR 0x%lx: vsid=0x%x, raw=0x%x\n", eaddr >> 28,
+			sre->vsid, sre->raw);
+#endif
+
+	pte->vpage = kvmppc_mmu_book3s_32_ea_to_vp(vcpu, eaddr, data);
+
+	ptegp = kvmppc_mmu_book3s_32_get_pteg(vcpu_book3s, sre, eaddr, primary);
+	if (kvm_is_error_hva(ptegp)) {
+		printk(KERN_INFO "KVM: Invalid PTEG!\n");
+		goto no_page_found;
+	}
+
+	ptem = kvmppc_mmu_book3s_32_get_ptem(sre, eaddr, primary);
+
+	if(copy_from_user(pteg, (void __user *)ptegp, sizeof(pteg))) {
+		printk(KERN_ERR "KVM: Can't copy data from 0x%lx!\n", ptegp);
+		goto no_page_found;
+	}
+
+	for (i=0; i<16; i+=2) {
+		// match?
+		if (ptem = pteg[i]) {
+			u8 pp;
+
+			pte->raddr = (pteg[i+1] & ~(0xFFFULL)) | (eaddr & 0xFFF);
+			pp = pteg[i+1] & 3;
+
+			if ((sre->Kp &&  (vcpu->arch.msr & MSR_PR)) ||
+			    (sre->Ks && !(vcpu->arch.msr & MSR_PR)))
+				pp |= 4;
+
+			pte->may_write = false;
+			pte->may_read = false;
+			pte->may_execute = true;
+			switch (pp) {
+				case 0:
+				case 1:
+				case 2:
+				case 6:
+					pte->may_write = true;
+				case 3:
+				case 5:
+				case 7:
+					pte->may_read = true;
+					break;
+			}
+
+			if ( !pte->may_read )
+				continue;
+#ifdef DEBUG_MMU_PTE
+			printk("MMU: Found PTE -> %x %x - %x\n", pteg[i], pteg[i+1], pp);
+#endif
+			found = 1;
+			break;
+		}
+	}
+
+	// Update PTE C and A bits, so the guest's swapper knows we used the page
+	if (found) {
+		u32 oldpte = pteg[i+1];
+
+		if (pte->may_read)
+			pteg[i+1] |= 0x00000100; // Accessed flag
+		if (pte->may_write)
+			pteg[i+1] |= 0x00000080; // Dirty flag
+#ifdef DEBUG_MMU_PTE
+		else
+		printk(KERN_INFO "KVM: Mapping read-only page!\n");
+#endif
+
+		// Write back into the PTEG
+		if (pteg[i+1] != oldpte)
+			copy_to_user((void __user *)ptegp, pteg, sizeof(pteg));
+
+		return 0;
+	}
+
+no_page_found:
+
+#ifdef DEBUG_MMU_PTE
+#ifdef DEBUG_MMU_PTE_IP
+	if (vcpu->arch.pc = DEBUG_MMU_PTE_IP)
+#endif
+	{
+	printk(KERN_INFO "KVM MMU: No PTE found (sdr1=0x%llx ptegp=0x%lx)\n", to_book3s(vcpu)->sdr1, ptegp);
+	for (i=0; i<16; i+=2) {
+		printk(KERN_INFO "   %02d: 0x%x - 0x%x (0x%llx)\n", i, pteg[i], pteg[i+1], ptem);
+	}
+	}
+#endif
+
+	return -ENOENT;
+}
+
+static int kvmppc_mmu_book3s_32_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *pte, bool data)
+{
+	int r;
+
+	pte->eaddr = eaddr;
+	r = kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, pte, data);
+	if (r < 0)
+	       r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte, data, true);
+	if (r < 0)
+	       r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte, data, false);
+
+	return r;
+}
+
+
+static u32 kvmppc_mmu_book3s_32_mfsrin(struct kvm_vcpu *vcpu, u32 srnum)
+{
+	return to_book3s(vcpu)->sr[srnum].raw;
+}
+
+static void kvmppc_mmu_book3s_32_mtsrin(struct kvm_vcpu *vcpu, u32 srnum, ulong value)
+{
+	struct kvmppc_sr *sre;
+
+	sre = &to_book3s(vcpu)->sr[srnum];
+
+	/* Flush any left-over shadows from the previous SR */
+//	kvmppc_mmu_pte_flush(vcpu, ((u64)sre->vsid) << 28, 0xf0000000ULL);
+
+	/* And then put in the new SR */
+	sre->raw = value;
+	sre->vsid = (value & 0x0fffffff);
+	sre->Ks = (value & 0x40000000) ? true : false;
+	sre->Kp = (value & 0x20000000) ? true : false;
+	sre->nx = (value & 0x10000000) ? true : false;
+
+	/* Map the new segment */
+	kvmppc_mmu_map_segment(vcpu, srnum << SID_SHIFT);
+}
+
+static void kvmppc_mmu_book3s_32_tlbie(struct kvm_vcpu *vcpu, ulong ea, bool large)
+{
+	kvmppc_mmu_pte_flush(vcpu, ea, ~0xFFFULL);
+}
+
+static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, u64 esid,
+					     u64 *vsid)
+{
+	/* In case we only have one of MSR_IR or MSR_DR set, let's put
+	   that in the real-mode context (and hope RM doesn't access
+	   high memory) */
+	switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+	case 0:
+		*vsid = (VSID_REAL >> 16) | esid;
+		break;
+	case MSR_IR:
+		*vsid = (VSID_REAL_IR >> 16) | esid;
+		break;
+	case MSR_DR:
+		*vsid = (VSID_REAL_DR >> 16) | esid;
+		break;
+	case MSR_DR|MSR_IR:
+	{
+		ulong ea;
+		ea = esid << SID_SHIFT;
+		*vsid = kvmppc_mmu_book3s_32_find_sr(to_book3s(vcpu), ea)->vsid;
+		break;
+	}
+	default:
+		BUG();
+	}
+
+	return 0;
+}
+
+static bool kvmppc_mmu_book3s_32_is_dcbz32(struct kvm_vcpu *vcpu)
+{
+	return true;
+}
+
+
+void kvmppc_mmu_book3s_32_init(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_mmu *mmu = &vcpu->arch.mmu;
+
+	mmu->mtsrin = kvmppc_mmu_book3s_32_mtsrin;
+	mmu->mfsrin = kvmppc_mmu_book3s_32_mfsrin;
+	mmu->xlate = kvmppc_mmu_book3s_32_xlate;
+	mmu->reset_msr = kvmppc_mmu_book3s_32_reset_msr;
+	mmu->tlbie = kvmppc_mmu_book3s_32_tlbie;
+	mmu->esid_to_vsid = kvmppc_mmu_book3s_32_esid_to_vsid;
+	mmu->ea_to_vp = kvmppc_mmu_book3s_32_ea_to_vp;
+	mmu->is_dcbz32 = kvmppc_mmu_book3s_32_is_dcbz32;
+
+	mmu->slbmte = NULL;
+	mmu->slbmfee = NULL;
+	mmu->slbmfev = NULL;
+	mmu->slbie = NULL;
+	mmu->slbia = NULL;
+}
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 14/27] Add book3s_64 specific opcode emulation
       [not found]                                     ` <1256137413-15256-14-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-21 15:03                                         ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

There are generic parts of PowerPC that can be shared across all
implementations and specific parts that only apply to BookE or desktop PPCs.

This patch adds emulation for desktop specific opcodes that don't apply
to BookE CPUs.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
---
 arch/powerpc/kvm/book3s_64_emulate.c |  338 ++++++++++++++++++++++++++++++++++
 1 files changed, 338 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_emulate.c

diff --git a/arch/powerpc/kvm/book3s_64_emulate.c b/arch/powerpc/kvm/book3s_64_emulate.c
new file mode 100644
index 0000000..60cd64a
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_emulate.c
@@ -0,0 +1,338 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
+ */
+
+#include <asm/kvm_ppc.h>
+#include <asm/disassemble.h>
+#include <asm/kvm_book3s.h>
+#include <asm/reg.h>
+
+#define OP_19_XOP_RFID		18
+#define OP_19_XOP_RFI		50
+
+#define OP_31_XOP_MFMSR		83
+#define OP_31_XOP_MTMSR		146
+#define OP_31_XOP_MTMSRD	178
+#define OP_31_XOP_MTSRIN	242
+#define OP_31_XOP_TLBIEL	274
+#define OP_31_XOP_TLBIE		306
+#define OP_31_XOP_SLBMTE	402
+#define OP_31_XOP_SLBIE		434
+#define OP_31_XOP_SLBIA		498
+#define OP_31_XOP_MFSRIN	659
+#define OP_31_XOP_SLBMFEV	851
+#define OP_31_XOP_EIOIO		854
+#define OP_31_XOP_SLBMFEE	915
+// DCBZ is actually 1014, but we patch it to 1010 so we get a trap
+#define OP_31_XOP_DCBZ		1010
+
+int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
+                           unsigned int inst, int *advance)
+{
+	int emulated = EMULATE_DONE;
+
+	switch (get_op(inst)) {
+	case 19:
+		switch (get_xop(inst)) {
+		case OP_19_XOP_RFID:
+		case OP_19_XOP_RFI:
+			vcpu->arch.pc = vcpu->arch.srr0;
+			kvmppc_set_msr(vcpu, vcpu->arch.srr1);
+			*advance = 0;
+			break;
+
+		default:
+			emulated = EMULATE_FAIL;
+			break;
+		}
+		break;
+	case 31:
+		switch (get_xop(inst)) {
+		case OP_31_XOP_MFMSR:
+			vcpu->arch.gpr[get_rt(inst)] = vcpu->arch.msr;
+			break;
+		case OP_31_XOP_MTMSRD:
+		{
+			ulong rs = vcpu->arch.gpr[get_rs(inst)];
+			if (inst & 0x10000) {
+				vcpu->arch.msr &= ~(MSR_RI | MSR_EE);
+				vcpu->arch.msr |= rs & (MSR_RI | MSR_EE);
+			} else
+				kvmppc_set_msr(vcpu, rs);
+			break;
+		}
+		case OP_31_XOP_MTMSR:
+			kvmppc_set_msr(vcpu, vcpu->arch.gpr[get_rs(inst)]);
+			break;
+		case OP_31_XOP_MFSRIN:
+		{
+			int srnum;
+
+			srnum = (vcpu->arch.gpr[get_rb(inst)] >> 28) & 0xf;
+			if (vcpu->arch.mmu.mfsrin) {
+				u32 sr;
+				sr = vcpu->arch.mmu.mfsrin(vcpu, srnum);
+				vcpu->arch.gpr[get_rt(inst)] = sr;
+			}
+			break;
+		}
+		case OP_31_XOP_MTSRIN:
+			vcpu->arch.mmu.mtsrin(vcpu,
+				(vcpu->arch.gpr[get_rb(inst)] >> 28) & 0xf,
+				vcpu->arch.gpr[get_rs(inst)]);
+			break;
+		case OP_31_XOP_TLBIE:
+		case OP_31_XOP_TLBIEL:
+		{
+			bool large = (inst & 0x00200000) ? true : false;
+			ulong addr = vcpu->arch.gpr[get_rb(inst)];
+			vcpu->arch.mmu.tlbie(vcpu, addr, large);
+			break;
+		}
+		case OP_31_XOP_EIOIO:
+			break;
+		case OP_31_XOP_SLBMTE:
+			if (!vcpu->arch.mmu.slbmte)
+				return EMULATE_FAIL;
+
+			vcpu->arch.mmu.slbmte(vcpu, vcpu->arch.gpr[get_rs(inst)],
+						vcpu->arch.gpr[get_rb(inst)]);
+			break;
+		case OP_31_XOP_SLBIE:
+			if (!vcpu->arch.mmu.slbie)
+				return EMULATE_FAIL;
+
+			vcpu->arch.mmu.slbie(vcpu, vcpu->arch.gpr[get_rb(inst)]);
+			break;
+		case OP_31_XOP_SLBIA:
+			if (!vcpu->arch.mmu.slbia)
+				return EMULATE_FAIL;
+
+			vcpu->arch.mmu.slbia(vcpu);
+			break;
+		case OP_31_XOP_SLBMFEE:
+			if (!vcpu->arch.mmu.slbmfee) {
+				emulated = EMULATE_FAIL;
+			} else {
+				ulong t, rb;
+
+				rb = vcpu->arch.gpr[get_rb(inst)];
+				t = vcpu->arch.mmu.slbmfee(vcpu, rb);
+				vcpu->arch.gpr[get_rt(inst)] = t;
+			}
+			break;
+		case OP_31_XOP_SLBMFEV:
+			if (!vcpu->arch.mmu.slbmfev) {
+				emulated = EMULATE_FAIL;
+			} else {
+				ulong t, rb;
+
+				rb = vcpu->arch.gpr[get_rb(inst)];
+				t = vcpu->arch.mmu.slbmfev(vcpu, rb);
+				vcpu->arch.gpr[get_rt(inst)] = t;
+			}
+			break;
+		case OP_31_XOP_DCBZ:
+		{
+			ulong rb =  vcpu->arch.gpr[get_rb(inst)];
+			ulong ra = 0;
+			ulong addr;
+			u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
+
+			if (get_ra(inst))
+				ra = vcpu->arch.gpr[get_ra(inst)];
+
+			addr = (ra + rb) & ~31ULL;
+			if (!(vcpu->arch.msr & MSR_SF))
+				addr &= 0xffffffff;
+
+			if (kvmppc_st(vcpu, addr, 32, zeros)) {
+				vcpu->arch.dear = addr;
+				vcpu->arch.fault_dear = addr;
+				to_book3s(vcpu)->dsisr = DSISR_PROTFAULT |
+						      DSISR_ISSTORE;
+				kvmppc_book3s_queue_irqprio(vcpu,
+					BOOK3S_INTERRUPT_DATA_STORAGE);
+				kvmppc_mmu_pte_flush(vcpu, addr, ~0xFFFULL);
+			}
+
+			break;
+		}
+		default:
+			emulated = EMULATE_FAIL;
+		}
+		break;
+	default:
+		emulated = EMULATE_FAIL;
+	}
+
+	return emulated;
+}
+
+static void kvmppc_write_bat(struct kvm_vcpu *vcpu, int sprn, u64 val)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_bat *bat;
+
+	switch (sprn) {
+	case SPRN_IBAT0U ... SPRN_IBAT3L:
+		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT0U) / 2];
+		break;
+	case SPRN_IBAT4U ... SPRN_IBAT7L:
+		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT4U) / 2];
+		break;
+	case SPRN_DBAT0U ... SPRN_DBAT3L:
+		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT0U) / 2];
+		break;
+	case SPRN_DBAT4U ... SPRN_DBAT7L:
+		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT4U) / 2];
+		break;
+	default:
+		BUG();
+	}
+
+	if (!(sprn % 2)) {
+		// Upper BAT
+		u32 bl = (val >> 2) & 0x7ff;
+		bat->bepi_mask = (~bl << 17);
+		bat->bepi = val & 0xfffe0000;
+		bat->vs = (val & 2) ? 1 : 0;
+		bat->vp = (val & 1) ? 1 : 0;
+	} else {
+		// Lower BAT
+		bat->brpn = val & 0xfffe0000;
+		bat->wimg = (val >> 3) & 0xf;
+		bat->pp = val & 3;
+	}
+}
+
+int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs)
+{
+	//struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	int emulated = EMULATE_DONE;
+
+	switch (sprn) {
+	case SPRN_SDR1:
+		to_book3s(vcpu)->sdr1 = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_DSISR:
+		to_book3s(vcpu)->dsisr = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_DAR:
+		vcpu->arch.dear = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HIOR:
+		to_book3s(vcpu)->hior = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_IBAT0U ... SPRN_IBAT3L:
+	case SPRN_IBAT4U ... SPRN_IBAT7L:
+	case SPRN_DBAT0U ... SPRN_DBAT3L:
+	case SPRN_DBAT4U ... SPRN_DBAT7L:
+		kvmppc_write_bat(vcpu, sprn, vcpu->arch.gpr[rs]);
+		// BAT writes happen so rarely that we're ok to flush
+		// everything here
+		kvmppc_mmu_pte_flush(vcpu, 0, 0);
+		break;
+	case SPRN_HID0:
+		to_book3s(vcpu)->hid[0] = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HID1:
+		to_book3s(vcpu)->hid[1] = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HID2:
+		to_book3s(vcpu)->hid[2] = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HID4:
+		to_book3s(vcpu)->hid[4] = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HID5:
+		to_book3s(vcpu)->hid[5] = vcpu->arch.gpr[rs];
+		/* guest HID5 set can change is_dcbz32 */
+		if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+		    (mfmsr() & MSR_HV))
+			vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
+		break;
+	case SPRN_ICTC:
+	case SPRN_THRM1:
+	case SPRN_THRM2:
+	case SPRN_THRM3:
+	case SPRN_CTRLF:
+	case SPRN_CTRLT:
+		break;
+	default:
+		printk(KERN_INFO "KVM: invalid SPR write: %d\n", sprn);
+#ifndef DEBUG_SPR
+		emulated = EMULATE_FAIL;
+#endif
+		break;
+	}
+
+	return emulated;
+}
+
+int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt)
+{
+	//struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	int emulated = EMULATE_DONE;
+
+	switch (sprn) {
+	case SPRN_SDR1:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->sdr1;
+		break;
+	case SPRN_DSISR:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->dsisr;
+		break;
+	case SPRN_DAR:
+		vcpu->arch.gpr[rt] = vcpu->arch.dear;
+		break;
+	case SPRN_HIOR:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hior;
+		break;
+	case SPRN_HID0:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[0];
+		break;
+	case SPRN_HID1:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[1];
+		break;
+	case SPRN_HID2:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[2];
+		break;
+	case SPRN_HID4:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[4];
+		break;
+	case SPRN_HID5:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[5];
+		break;
+	case SPRN_THRM1:
+	case SPRN_THRM2:
+	case SPRN_THRM3:
+	case SPRN_CTRLF:
+	case SPRN_CTRLT:
+		vcpu->arch.gpr[rt] = 0;
+		break;
+	default:
+		printk(KERN_INFO "KVM: invalid SPR read: %d\n", sprn);
+#ifndef DEBUG_SPR
+		emulated = EMULATE_FAIL;
+#endif
+		break;
+	}
+
+	return emulated;
+}
+
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 14/27] Add book3s_64 specific opcode emulation
@ 2009-10-21 15:03                                         ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

There are generic parts of PowerPC that can be shared across all
implementations and specific parts that only apply to BookE or desktop PPCs.

This patch adds emulation for desktop specific opcodes that don't apply
to BookE CPUs.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s_64_emulate.c |  338 ++++++++++++++++++++++++++++++++++
 1 files changed, 338 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_emulate.c

diff --git a/arch/powerpc/kvm/book3s_64_emulate.c b/arch/powerpc/kvm/book3s_64_emulate.c
new file mode 100644
index 0000000..60cd64a
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_emulate.c
@@ -0,0 +1,338 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <asm/kvm_ppc.h>
+#include <asm/disassemble.h>
+#include <asm/kvm_book3s.h>
+#include <asm/reg.h>
+
+#define OP_19_XOP_RFID		18
+#define OP_19_XOP_RFI		50
+
+#define OP_31_XOP_MFMSR		83
+#define OP_31_XOP_MTMSR		146
+#define OP_31_XOP_MTMSRD	178
+#define OP_31_XOP_MTSRIN	242
+#define OP_31_XOP_TLBIEL	274
+#define OP_31_XOP_TLBIE		306
+#define OP_31_XOP_SLBMTE	402
+#define OP_31_XOP_SLBIE		434
+#define OP_31_XOP_SLBIA		498
+#define OP_31_XOP_MFSRIN	659
+#define OP_31_XOP_SLBMFEV	851
+#define OP_31_XOP_EIOIO		854
+#define OP_31_XOP_SLBMFEE	915
+// DCBZ is actually 1014, but we patch it to 1010 so we get a trap
+#define OP_31_XOP_DCBZ		1010
+
+int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
+                           unsigned int inst, int *advance)
+{
+	int emulated = EMULATE_DONE;
+
+	switch (get_op(inst)) {
+	case 19:
+		switch (get_xop(inst)) {
+		case OP_19_XOP_RFID:
+		case OP_19_XOP_RFI:
+			vcpu->arch.pc = vcpu->arch.srr0;
+			kvmppc_set_msr(vcpu, vcpu->arch.srr1);
+			*advance = 0;
+			break;
+
+		default:
+			emulated = EMULATE_FAIL;
+			break;
+		}
+		break;
+	case 31:
+		switch (get_xop(inst)) {
+		case OP_31_XOP_MFMSR:
+			vcpu->arch.gpr[get_rt(inst)] = vcpu->arch.msr;
+			break;
+		case OP_31_XOP_MTMSRD:
+		{
+			ulong rs = vcpu->arch.gpr[get_rs(inst)];
+			if (inst & 0x10000) {
+				vcpu->arch.msr &= ~(MSR_RI | MSR_EE);
+				vcpu->arch.msr |= rs & (MSR_RI | MSR_EE);
+			} else
+				kvmppc_set_msr(vcpu, rs);
+			break;
+		}
+		case OP_31_XOP_MTMSR:
+			kvmppc_set_msr(vcpu, vcpu->arch.gpr[get_rs(inst)]);
+			break;
+		case OP_31_XOP_MFSRIN:
+		{
+			int srnum;
+
+			srnum = (vcpu->arch.gpr[get_rb(inst)] >> 28) & 0xf;
+			if (vcpu->arch.mmu.mfsrin) {
+				u32 sr;
+				sr = vcpu->arch.mmu.mfsrin(vcpu, srnum);
+				vcpu->arch.gpr[get_rt(inst)] = sr;
+			}
+			break;
+		}
+		case OP_31_XOP_MTSRIN:
+			vcpu->arch.mmu.mtsrin(vcpu,
+				(vcpu->arch.gpr[get_rb(inst)] >> 28) & 0xf,
+				vcpu->arch.gpr[get_rs(inst)]);
+			break;
+		case OP_31_XOP_TLBIE:
+		case OP_31_XOP_TLBIEL:
+		{
+			bool large = (inst & 0x00200000) ? true : false;
+			ulong addr = vcpu->arch.gpr[get_rb(inst)];
+			vcpu->arch.mmu.tlbie(vcpu, addr, large);
+			break;
+		}
+		case OP_31_XOP_EIOIO:
+			break;
+		case OP_31_XOP_SLBMTE:
+			if (!vcpu->arch.mmu.slbmte)
+				return EMULATE_FAIL;
+
+			vcpu->arch.mmu.slbmte(vcpu, vcpu->arch.gpr[get_rs(inst)],
+						vcpu->arch.gpr[get_rb(inst)]);
+			break;
+		case OP_31_XOP_SLBIE:
+			if (!vcpu->arch.mmu.slbie)
+				return EMULATE_FAIL;
+
+			vcpu->arch.mmu.slbie(vcpu, vcpu->arch.gpr[get_rb(inst)]);
+			break;
+		case OP_31_XOP_SLBIA:
+			if (!vcpu->arch.mmu.slbia)
+				return EMULATE_FAIL;
+
+			vcpu->arch.mmu.slbia(vcpu);
+			break;
+		case OP_31_XOP_SLBMFEE:
+			if (!vcpu->arch.mmu.slbmfee) {
+				emulated = EMULATE_FAIL;
+			} else {
+				ulong t, rb;
+
+				rb = vcpu->arch.gpr[get_rb(inst)];
+				t = vcpu->arch.mmu.slbmfee(vcpu, rb);
+				vcpu->arch.gpr[get_rt(inst)] = t;
+			}
+			break;
+		case OP_31_XOP_SLBMFEV:
+			if (!vcpu->arch.mmu.slbmfev) {
+				emulated = EMULATE_FAIL;
+			} else {
+				ulong t, rb;
+
+				rb = vcpu->arch.gpr[get_rb(inst)];
+				t = vcpu->arch.mmu.slbmfev(vcpu, rb);
+				vcpu->arch.gpr[get_rt(inst)] = t;
+			}
+			break;
+		case OP_31_XOP_DCBZ:
+		{
+			ulong rb =  vcpu->arch.gpr[get_rb(inst)];
+			ulong ra = 0;
+			ulong addr;
+			u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
+
+			if (get_ra(inst))
+				ra = vcpu->arch.gpr[get_ra(inst)];
+
+			addr = (ra + rb) & ~31ULL;
+			if (!(vcpu->arch.msr & MSR_SF))
+				addr &= 0xffffffff;
+
+			if (kvmppc_st(vcpu, addr, 32, zeros)) {
+				vcpu->arch.dear = addr;
+				vcpu->arch.fault_dear = addr;
+				to_book3s(vcpu)->dsisr = DSISR_PROTFAULT |
+						      DSISR_ISSTORE;
+				kvmppc_book3s_queue_irqprio(vcpu,
+					BOOK3S_INTERRUPT_DATA_STORAGE);
+				kvmppc_mmu_pte_flush(vcpu, addr, ~0xFFFULL);
+			}
+
+			break;
+		}
+		default:
+			emulated = EMULATE_FAIL;
+		}
+		break;
+	default:
+		emulated = EMULATE_FAIL;
+	}
+
+	return emulated;
+}
+
+static void kvmppc_write_bat(struct kvm_vcpu *vcpu, int sprn, u64 val)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_bat *bat;
+
+	switch (sprn) {
+	case SPRN_IBAT0U ... SPRN_IBAT3L:
+		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT0U) / 2];
+		break;
+	case SPRN_IBAT4U ... SPRN_IBAT7L:
+		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT4U) / 2];
+		break;
+	case SPRN_DBAT0U ... SPRN_DBAT3L:
+		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT0U) / 2];
+		break;
+	case SPRN_DBAT4U ... SPRN_DBAT7L:
+		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT4U) / 2];
+		break;
+	default:
+		BUG();
+	}
+
+	if (!(sprn % 2)) {
+		// Upper BAT
+		u32 bl = (val >> 2) & 0x7ff;
+		bat->bepi_mask = (~bl << 17);
+		bat->bepi = val & 0xfffe0000;
+		bat->vs = (val & 2) ? 1 : 0;
+		bat->vp = (val & 1) ? 1 : 0;
+	} else {
+		// Lower BAT
+		bat->brpn = val & 0xfffe0000;
+		bat->wimg = (val >> 3) & 0xf;
+		bat->pp = val & 3;
+	}
+}
+
+int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs)
+{
+	//struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	int emulated = EMULATE_DONE;
+
+	switch (sprn) {
+	case SPRN_SDR1:
+		to_book3s(vcpu)->sdr1 = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_DSISR:
+		to_book3s(vcpu)->dsisr = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_DAR:
+		vcpu->arch.dear = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HIOR:
+		to_book3s(vcpu)->hior = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_IBAT0U ... SPRN_IBAT3L:
+	case SPRN_IBAT4U ... SPRN_IBAT7L:
+	case SPRN_DBAT0U ... SPRN_DBAT3L:
+	case SPRN_DBAT4U ... SPRN_DBAT7L:
+		kvmppc_write_bat(vcpu, sprn, vcpu->arch.gpr[rs]);
+		// BAT writes happen so rarely that we're ok to flush
+		// everything here
+		kvmppc_mmu_pte_flush(vcpu, 0, 0);
+		break;
+	case SPRN_HID0:
+		to_book3s(vcpu)->hid[0] = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HID1:
+		to_book3s(vcpu)->hid[1] = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HID2:
+		to_book3s(vcpu)->hid[2] = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HID4:
+		to_book3s(vcpu)->hid[4] = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HID5:
+		to_book3s(vcpu)->hid[5] = vcpu->arch.gpr[rs];
+		/* guest HID5 set can change is_dcbz32 */
+		if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+		    (mfmsr() & MSR_HV))
+			vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
+		break;
+	case SPRN_ICTC:
+	case SPRN_THRM1:
+	case SPRN_THRM2:
+	case SPRN_THRM3:
+	case SPRN_CTRLF:
+	case SPRN_CTRLT:
+		break;
+	default:
+		printk(KERN_INFO "KVM: invalid SPR write: %d\n", sprn);
+#ifndef DEBUG_SPR
+		emulated = EMULATE_FAIL;
+#endif
+		break;
+	}
+
+	return emulated;
+}
+
+int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt)
+{
+	//struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	int emulated = EMULATE_DONE;
+
+	switch (sprn) {
+	case SPRN_SDR1:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->sdr1;
+		break;
+	case SPRN_DSISR:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->dsisr;
+		break;
+	case SPRN_DAR:
+		vcpu->arch.gpr[rt] = vcpu->arch.dear;
+		break;
+	case SPRN_HIOR:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hior;
+		break;
+	case SPRN_HID0:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[0];
+		break;
+	case SPRN_HID1:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[1];
+		break;
+	case SPRN_HID2:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[2];
+		break;
+	case SPRN_HID4:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[4];
+		break;
+	case SPRN_HID5:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[5];
+		break;
+	case SPRN_THRM1:
+	case SPRN_THRM2:
+	case SPRN_THRM3:
+	case SPRN_CTRLF:
+	case SPRN_CTRLT:
+		vcpu->arch.gpr[rt] = 0;
+		break;
+	default:
+		printk(KERN_INFO "KVM: invalid SPR read: %d\n", sprn);
+#ifndef DEBUG_SPR
+		emulated = EMULATE_FAIL;
+#endif
+		break;
+	}
+
+	return emulated;
+}
+
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 15/27] Add mfdec emulation
       [not found]                                         ` <1256137413-15256-15-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-21 15:03                                             ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

We support setting the DEC to a certain value right now. Doing that basically
triggers the CPU local timer.

But there's also an mfdec command that enabled the OS to read the decrementor.

This is required at least by all desktop and server PowerPC Linux kernels. It
can't really hurt to allow embedded ones to do it as well though.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
---
 arch/powerpc/kvm/emulate.c |   13 ++++++++++++-
 1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 7737146..50d411d 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -66,12 +66,14 @@
 
 void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 {
+	unsigned long nr_jiffies;
+
 	if (vcpu->arch.tcr & TCR_DIE) {
 		/* The decrementer ticks at the same rate as the timebase, so
 		 * that's how we convert the guest DEC value to the number of
 		 * host ticks. */
-		unsigned long nr_jiffies;
 
+		vcpu->arch.dec_jiffies = mftb();
 		nr_jiffies = vcpu->arch.dec / tb_ticks_per_jiffy;
 		mod_timer(&vcpu->arch.dec_timer,
 		          get_jiffies_64() + nr_jiffies);
@@ -211,6 +213,15 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 			/* Note: SPRG4-7 are user-readable, so we don't get
 			 * a trap. */
 
+			case SPRN_DEC:
+			{
+				u64 jd = mftb() - vcpu->arch.dec_jiffies;
+				vcpu->arch.gpr[rt] = vcpu->arch.dec - jd;
+#ifdef DEBUG_EMUL
+				printk(KERN_INFO "mfDEC: %x - %llx = %lx\n", vcpu->arch.dec, jd, vcpu->arch.gpr[rt]);
+#endif
+				break;
+			}
 			default:
 				emulated = kvmppc_core_emulate_mfspr(vcpu, sprn, rt);
 				if (emulated == EMULATE_FAIL) {
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 15/27] Add mfdec emulation
@ 2009-10-21 15:03                                             ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

We support setting the DEC to a certain value right now. Doing that basically
triggers the CPU local timer.

But there's also an mfdec command that enabled the OS to read the decrementor.

This is required at least by all desktop and server PowerPC Linux kernels. It
can't really hurt to allow embedded ones to do it as well though.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/emulate.c |   13 ++++++++++++-
 1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 7737146..50d411d 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -66,12 +66,14 @@
 
 void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 {
+	unsigned long nr_jiffies;
+
 	if (vcpu->arch.tcr & TCR_DIE) {
 		/* The decrementer ticks at the same rate as the timebase, so
 		 * that's how we convert the guest DEC value to the number of
 		 * host ticks. */
-		unsigned long nr_jiffies;
 
+		vcpu->arch.dec_jiffies = mftb();
 		nr_jiffies = vcpu->arch.dec / tb_ticks_per_jiffy;
 		mod_timer(&vcpu->arch.dec_timer,
 		          get_jiffies_64() + nr_jiffies);
@@ -211,6 +213,15 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 			/* Note: SPRG4-7 are user-readable, so we don't get
 			 * a trap. */
 
+			case SPRN_DEC:
+			{
+				u64 jd = mftb() - vcpu->arch.dec_jiffies;
+				vcpu->arch.gpr[rt] = vcpu->arch.dec - jd;
+#ifdef DEBUG_EMUL
+				printk(KERN_INFO "mfDEC: %x - %llx = %lx\n", vcpu->arch.dec, jd, vcpu->arch.gpr[rt]);
+#endif
+				break;
+			}
 			default:
 				emulated = kvmppc_core_emulate_mfspr(vcpu, sprn, rt);
 				if (emulated = EMULATE_FAIL) {
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 16/27] Add desktop PowerPC specific emulation
       [not found]                                             ` <1256137413-15256-16-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-21 15:03                                                 ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

Little opcodes behave differently on desktop and embedded PowerPC cores.
In order to reflect those differences, let's add some #ifdef code to emulate.c.

We could probably also handle them in the core specific emulation files, but I
would prefer to reuse as much code as possible.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>

---

v4 -> v5:

  - use get_tb instead of mftb
  - make ppc32 and ppc64 emulation share more code
---
 arch/powerpc/kvm/emulate.c |   49 +++++++++++++++++++++++++++++++++++---------
 1 files changed, 39 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 50d411d..1ec5e07 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -32,6 +32,7 @@
 #include "trace.h"
 
 #define OP_TRAP 3
+#define OP_TRAP_64 2
 
 #define OP_31_XOP_LWZX      23
 #define OP_31_XOP_LBZX      87
@@ -64,16 +65,36 @@
 #define OP_STH  44
 #define OP_STHU 45
 
+#ifdef CONFIG_PPC64
+static int kvmppc_dec_enabled(struct kvm_vcpu *vcpu)
+{
+	return 1;
+}
+#else
+static int kvmppc_dec_enabled(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.tcr & TCR_DIE;
+}
+#endif
+
 void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr_jiffies;
 
-	if (vcpu->arch.tcr & TCR_DIE) {
+#ifdef CONFIG_PPC64
+	/* POWER4+ triggers a dec interrupt if the value is < 0 */
+	if (vcpu->arch.dec & 0x80000000) {
+		del_timer(&vcpu->arch.dec_timer);
+		kvmppc_core_queue_dec(vcpu);
+		return;
+	}
+#endif
+	if (kvmppc_dec_enabled(vcpu)) {
 		/* The decrementer ticks at the same rate as the timebase, so
 		 * that's how we convert the guest DEC value to the number of
 		 * host ticks. */
 
-		vcpu->arch.dec_jiffies = mftb();
+		vcpu->arch.dec_jiffies = get_tb();
 		nr_jiffies = vcpu->arch.dec / tb_ticks_per_jiffy;
 		mod_timer(&vcpu->arch.dec_timer,
 		          get_jiffies_64() + nr_jiffies);
@@ -113,9 +134,15 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 	/* this default type might be overwritten by subcategories */
 	kvmppc_set_exit_type(vcpu, EMULATED_INST_EXITS);
 
+	pr_debug(KERN_INFO "Emulating opcode %d / %d\n", get_op(inst), get_xop(inst));
+
 	switch (get_op(inst)) {
 	case OP_TRAP:
+#ifdef CONFIG_PPC64
+	case OP_TRAP_64:
+#else
 		vcpu->arch.esr |= ESR_PTR;
+#endif
 		kvmppc_core_queue_program(vcpu);
 		advance = 0;
 		break;
@@ -190,17 +217,19 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 			case SPRN_SRR1:
 				vcpu->arch.gpr[rt] = vcpu->arch.srr1; break;
 			case SPRN_PVR:
-				vcpu->arch.gpr[rt] = mfspr(SPRN_PVR); break;
+				vcpu->arch.gpr[rt] = vcpu->arch.pvr; break;
 			case SPRN_PIR:
-				vcpu->arch.gpr[rt] = mfspr(SPRN_PIR); break;
+				vcpu->arch.gpr[rt] = vcpu->vcpu_id; break;
+			case SPRN_MSSSR0:
+				vcpu->arch.gpr[rt] = 0; break;
 
 			/* Note: mftb and TBRL/TBWL are user-accessible, so
 			 * the guest can always access the real TB anyways.
 			 * In fact, we probably will never see these traps. */
 			case SPRN_TBWL:
-				vcpu->arch.gpr[rt] = mftbl(); break;
+				vcpu->arch.gpr[rt] = get_tb() >> 32; break;
 			case SPRN_TBWU:
-				vcpu->arch.gpr[rt] = mftbu(); break;
+				vcpu->arch.gpr[rt] = get_tb(); break;
 
 			case SPRN_SPRG0:
 				vcpu->arch.gpr[rt] = vcpu->arch.sprg0; break;
@@ -215,11 +244,9 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 
 			case SPRN_DEC:
 			{
-				u64 jd = mftb() - vcpu->arch.dec_jiffies;
+				u64 jd = get_tb() - vcpu->arch.dec_jiffies;
 				vcpu->arch.gpr[rt] = vcpu->arch.dec - jd;
-#ifdef DEBUG_EMUL
-				printk(KERN_INFO "mfDEC: %x - %llx = %lx\n", vcpu->arch.dec, jd, vcpu->arch.gpr[rt]);
-#endif
+				pr_debug(KERN_INFO "mfDEC: %x - %llx = %lx\n", vcpu->arch.dec, jd, vcpu->arch.gpr[rt]);
 				break;
 			}
 			default:
@@ -271,6 +298,8 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 			case SPRN_TBWL: break;
 			case SPRN_TBWU: break;
 
+			case SPRN_MSSSR0: break;
+
 			case SPRN_DEC:
 				vcpu->arch.dec = vcpu->arch.gpr[rs];
 				kvmppc_emulate_dec(vcpu);
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 16/27] Add desktop PowerPC specific emulation
@ 2009-10-21 15:03                                                 ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

Little opcodes behave differently on desktop and embedded PowerPC cores.
In order to reflect those differences, let's add some #ifdef code to emulate.c.

We could probably also handle them in the core specific emulation files, but I
would prefer to reuse as much code as possible.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v4 -> v5:

  - use get_tb instead of mftb
  - make ppc32 and ppc64 emulation share more code
---
 arch/powerpc/kvm/emulate.c |   49 +++++++++++++++++++++++++++++++++++---------
 1 files changed, 39 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 50d411d..1ec5e07 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -32,6 +32,7 @@
 #include "trace.h"
 
 #define OP_TRAP 3
+#define OP_TRAP_64 2
 
 #define OP_31_XOP_LWZX      23
 #define OP_31_XOP_LBZX      87
@@ -64,16 +65,36 @@
 #define OP_STH  44
 #define OP_STHU 45
 
+#ifdef CONFIG_PPC64
+static int kvmppc_dec_enabled(struct kvm_vcpu *vcpu)
+{
+	return 1;
+}
+#else
+static int kvmppc_dec_enabled(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.tcr & TCR_DIE;
+}
+#endif
+
 void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr_jiffies;
 
-	if (vcpu->arch.tcr & TCR_DIE) {
+#ifdef CONFIG_PPC64
+	/* POWER4+ triggers a dec interrupt if the value is < 0 */
+	if (vcpu->arch.dec & 0x80000000) {
+		del_timer(&vcpu->arch.dec_timer);
+		kvmppc_core_queue_dec(vcpu);
+		return;
+	}
+#endif
+	if (kvmppc_dec_enabled(vcpu)) {
 		/* The decrementer ticks at the same rate as the timebase, so
 		 * that's how we convert the guest DEC value to the number of
 		 * host ticks. */
 
-		vcpu->arch.dec_jiffies = mftb();
+		vcpu->arch.dec_jiffies = get_tb();
 		nr_jiffies = vcpu->arch.dec / tb_ticks_per_jiffy;
 		mod_timer(&vcpu->arch.dec_timer,
 		          get_jiffies_64() + nr_jiffies);
@@ -113,9 +134,15 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 	/* this default type might be overwritten by subcategories */
 	kvmppc_set_exit_type(vcpu, EMULATED_INST_EXITS);
 
+	pr_debug(KERN_INFO "Emulating opcode %d / %d\n", get_op(inst), get_xop(inst));
+
 	switch (get_op(inst)) {
 	case OP_TRAP:
+#ifdef CONFIG_PPC64
+	case OP_TRAP_64:
+#else
 		vcpu->arch.esr |= ESR_PTR;
+#endif
 		kvmppc_core_queue_program(vcpu);
 		advance = 0;
 		break;
@@ -190,17 +217,19 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 			case SPRN_SRR1:
 				vcpu->arch.gpr[rt] = vcpu->arch.srr1; break;
 			case SPRN_PVR:
-				vcpu->arch.gpr[rt] = mfspr(SPRN_PVR); break;
+				vcpu->arch.gpr[rt] = vcpu->arch.pvr; break;
 			case SPRN_PIR:
-				vcpu->arch.gpr[rt] = mfspr(SPRN_PIR); break;
+				vcpu->arch.gpr[rt] = vcpu->vcpu_id; break;
+			case SPRN_MSSSR0:
+				vcpu->arch.gpr[rt] = 0; break;
 
 			/* Note: mftb and TBRL/TBWL are user-accessible, so
 			 * the guest can always access the real TB anyways.
 			 * In fact, we probably will never see these traps. */
 			case SPRN_TBWL:
-				vcpu->arch.gpr[rt] = mftbl(); break;
+				vcpu->arch.gpr[rt] = get_tb() >> 32; break;
 			case SPRN_TBWU:
-				vcpu->arch.gpr[rt] = mftbu(); break;
+				vcpu->arch.gpr[rt] = get_tb(); break;
 
 			case SPRN_SPRG0:
 				vcpu->arch.gpr[rt] = vcpu->arch.sprg0; break;
@@ -215,11 +244,9 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 
 			case SPRN_DEC:
 			{
-				u64 jd = mftb() - vcpu->arch.dec_jiffies;
+				u64 jd = get_tb() - vcpu->arch.dec_jiffies;
 				vcpu->arch.gpr[rt] = vcpu->arch.dec - jd;
-#ifdef DEBUG_EMUL
-				printk(KERN_INFO "mfDEC: %x - %llx = %lx\n", vcpu->arch.dec, jd, vcpu->arch.gpr[rt]);
-#endif
+				pr_debug(KERN_INFO "mfDEC: %x - %llx = %lx\n", vcpu->arch.dec, jd, vcpu->arch.gpr[rt]);
 				break;
 			}
 			default:
@@ -271,6 +298,8 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 			case SPRN_TBWL: break;
 			case SPRN_TBWU: break;
 
+			case SPRN_MSSSR0: break;
+
 			case SPRN_DEC:
 				vcpu->arch.dec = vcpu->arch.gpr[rs];
 				kvmppc_emulate_dec(vcpu);
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 17/27] Make head_64.S aware of KVM real mode code
       [not found]                                                 ` <1256137413-15256-17-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-21 15:03                                                     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

We need to run some KVM trampoline code in real mode. Unfortunately, real mode
only covers 8MB on Cell so we need to squeeze ourselves as low as possible.

Also, we need to trap interrupts to get us back from guest state to host state
without telling Linux about it.

This patch adds interrupt traps and includes the KVM code that requires real
mode in the real mode parts of Linux.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
---
 arch/powerpc/include/asm/exception-64s.h |    2 ++
 arch/powerpc/kernel/exceptions-64s.S     |    8 ++++++++
 arch/powerpc/kernel/head_64.S            |    7 +++++++
 3 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index a98653b..57c4000 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -147,6 +147,7 @@
 	.globl label##_pSeries;				\
 label##_pSeries:					\
 	HMT_MEDIUM;					\
+	DO_KVM	n;					\
 	mtspr	SPRN_SPRG_SCRATCH0,r13;		/* save r13 */	\
 	EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common)
 
@@ -170,6 +171,7 @@ label##_pSeries:					\
 	.globl label##_pSeries;						\
 label##_pSeries:							\
 	HMT_MEDIUM;							\
+	DO_KVM	n;							\
 	mtspr	SPRN_SPRG_SCRATCH0,r13;	/* save r13 */			\
 	mfspr	r13,SPRN_SPRG_PACA;	/* get paca address into r13 */	\
 	std	r9,PACA_EXGEN+EX_R9(r13);	/* save r9, r10 */	\
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 1808876..fc3ead0 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -41,6 +41,7 @@ __start_interrupts:
 	. = 0x200
 _machine_check_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0x200
 	mtspr	SPRN_SPRG_SCRATCH0,r13		/* save r13 */
 	EXCEPTION_PROLOG_PSERIES(PACA_EXMC, machine_check_common)
 
@@ -48,6 +49,7 @@ _machine_check_pSeries:
 	.globl data_access_pSeries
 data_access_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0x300
 	mtspr	SPRN_SPRG_SCRATCH0,r13
 BEGIN_FTR_SECTION
 	mfspr	r13,SPRN_SPRG_PACA
@@ -77,6 +79,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_SLB)
 	.globl data_access_slb_pSeries
 data_access_slb_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0x380
 	mtspr	SPRN_SPRG_SCRATCH0,r13
 	mfspr	r13,SPRN_SPRG_PACA		/* get paca address into r13 */
 	std	r3,PACA_EXSLB+EX_R3(r13)
@@ -115,6 +118,7 @@ data_access_slb_pSeries:
 	.globl instruction_access_slb_pSeries
 instruction_access_slb_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0x480
 	mtspr	SPRN_SPRG_SCRATCH0,r13
 	mfspr	r13,SPRN_SPRG_PACA		/* get paca address into r13 */
 	std	r3,PACA_EXSLB+EX_R3(r13)
@@ -154,6 +158,7 @@ instruction_access_slb_pSeries:
 	.globl	system_call_pSeries
 system_call_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0xc00
 BEGIN_FTR_SECTION
 	cmpdi	r0,0x1ebe
 	beq-	1f
@@ -186,12 +191,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	 * trickery is thus necessary
 	 */
 	. = 0xf00
+	DO_KVM	0xf00
 	b	performance_monitor_pSeries
 
 	. = 0xf20
+	DO_KVM	0xf20
 	b	altivec_unavailable_pSeries
 
 	. = 0xf40
+	DO_KVM	0xf40
 	b	vsx_unavailable_pSeries
 
 #ifdef CONFIG_CBE_RAS
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index c38afdb..9258074 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -37,6 +37,7 @@
 #include <asm/firmware.h>
 #include <asm/page_64.h>
 #include <asm/irqflags.h>
+#include <asm/kvm_book3s_64_asm.h>
 
 /* The physical memory is layed out such that the secondary processor
  * spin code sits at 0x0000...0x00ff. On server, the vectors follow
@@ -165,6 +166,12 @@ exception_marker:
 #include "exceptions-64s.S"
 #endif
 
+/* KVM trampoline code needs to be close to the interrupt handlers */
+
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+#include "../kvm/book3s_64_rmhandlers.S"
+#endif
+
 _GLOBAL(generic_secondary_thread_init)
 	mr	r24,r3
 
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 17/27] Make head_64.S aware of KVM real mode code
@ 2009-10-21 15:03                                                     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

We need to run some KVM trampoline code in real mode. Unfortunately, real mode
only covers 8MB on Cell so we need to squeeze ourselves as low as possible.

Also, we need to trap interrupts to get us back from guest state to host state
without telling Linux about it.

This patch adds interrupt traps and includes the KVM code that requires real
mode in the real mode parts of Linux.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/exception-64s.h |    2 ++
 arch/powerpc/kernel/exceptions-64s.S     |    8 ++++++++
 arch/powerpc/kernel/head_64.S            |    7 +++++++
 3 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index a98653b..57c4000 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -147,6 +147,7 @@
 	.globl label##_pSeries;				\
 label##_pSeries:					\
 	HMT_MEDIUM;					\
+	DO_KVM	n;					\
 	mtspr	SPRN_SPRG_SCRATCH0,r13;		/* save r13 */	\
 	EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common)
 
@@ -170,6 +171,7 @@ label##_pSeries:					\
 	.globl label##_pSeries;						\
 label##_pSeries:							\
 	HMT_MEDIUM;							\
+	DO_KVM	n;							\
 	mtspr	SPRN_SPRG_SCRATCH0,r13;	/* save r13 */			\
 	mfspr	r13,SPRN_SPRG_PACA;	/* get paca address into r13 */	\
 	std	r9,PACA_EXGEN+EX_R9(r13);	/* save r9, r10 */	\
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 1808876..fc3ead0 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -41,6 +41,7 @@ __start_interrupts:
 	. = 0x200
 _machine_check_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0x200
 	mtspr	SPRN_SPRG_SCRATCH0,r13		/* save r13 */
 	EXCEPTION_PROLOG_PSERIES(PACA_EXMC, machine_check_common)
 
@@ -48,6 +49,7 @@ _machine_check_pSeries:
 	.globl data_access_pSeries
 data_access_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0x300
 	mtspr	SPRN_SPRG_SCRATCH0,r13
 BEGIN_FTR_SECTION
 	mfspr	r13,SPRN_SPRG_PACA
@@ -77,6 +79,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_SLB)
 	.globl data_access_slb_pSeries
 data_access_slb_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0x380
 	mtspr	SPRN_SPRG_SCRATCH0,r13
 	mfspr	r13,SPRN_SPRG_PACA		/* get paca address into r13 */
 	std	r3,PACA_EXSLB+EX_R3(r13)
@@ -115,6 +118,7 @@ data_access_slb_pSeries:
 	.globl instruction_access_slb_pSeries
 instruction_access_slb_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0x480
 	mtspr	SPRN_SPRG_SCRATCH0,r13
 	mfspr	r13,SPRN_SPRG_PACA		/* get paca address into r13 */
 	std	r3,PACA_EXSLB+EX_R3(r13)
@@ -154,6 +158,7 @@ instruction_access_slb_pSeries:
 	.globl	system_call_pSeries
 system_call_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0xc00
 BEGIN_FTR_SECTION
 	cmpdi	r0,0x1ebe
 	beq-	1f
@@ -186,12 +191,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	 * trickery is thus necessary
 	 */
 	. = 0xf00
+	DO_KVM	0xf00
 	b	performance_monitor_pSeries
 
 	. = 0xf20
+	DO_KVM	0xf20
 	b	altivec_unavailable_pSeries
 
 	. = 0xf40
+	DO_KVM	0xf40
 	b	vsx_unavailable_pSeries
 
 #ifdef CONFIG_CBE_RAS
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index c38afdb..9258074 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -37,6 +37,7 @@
 #include <asm/firmware.h>
 #include <asm/page_64.h>
 #include <asm/irqflags.h>
+#include <asm/kvm_book3s_64_asm.h>
 
 /* The physical memory is layed out such that the secondary processor
  * spin code sits at 0x0000...0x00ff. On server, the vectors follow
@@ -165,6 +166,12 @@ exception_marker:
 #include "exceptions-64s.S"
 #endif
 
+/* KVM trampoline code needs to be close to the interrupt handlers */
+
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+#include "../kvm/book3s_64_rmhandlers.S"
+#endif
+
 _GLOBAL(generic_secondary_thread_init)
 	mr	r24,r3
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 18/27] Add Book3s_64 offsets to asm-offsets.c
  2009-10-21 15:03                                                     ` Alexander Graf
@ 2009-10-21 15:03                                                       ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

We need to access some VCPU fields from assembly code. In order to get
the proper offsets, we have to define them in asm-offsets.c.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kernel/asm-offsets.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 0812b0f..aba3ea6 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -398,6 +398,19 @@ int main(void)
 	DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, arch.last_inst));
 	DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
 	DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
+
+	/* book3s_64 */
+#ifdef CONFIG_PPC64
+	DEFINE(VCPU_FAULT_DSISR, offsetof(struct kvm_vcpu, arch.fault_dsisr));
+	DEFINE(VCPU_HOST_RETIP, offsetof(struct kvm_vcpu, arch.host_retip));
+	DEFINE(VCPU_HOST_R2, offsetof(struct kvm_vcpu, arch.host_r2));
+	DEFINE(VCPU_HOST_MSR, offsetof(struct kvm_vcpu, arch.host_msr));
+	DEFINE(VCPU_SHADOW_MSR, offsetof(struct kvm_vcpu, arch.shadow_msr));
+	DEFINE(VCPU_TRAMPOLINE_LOWMEM, offsetof(struct kvm_vcpu, arch.trampoline_lowmem));
+	DEFINE(VCPU_TRAMPOLINE_ENTER, offsetof(struct kvm_vcpu, arch.trampoline_enter));
+	DEFINE(VCPU_HIGHMEM_HANDLER, offsetof(struct kvm_vcpu, arch.highmem_handler));
+	DEFINE(VCPU_HFLAGS, offsetof(struct kvm_vcpu, arch.hflags));
+#endif
 #endif
 #ifdef CONFIG_44x
 	DEFINE(PGD_T_LOG2, PGD_T_LOG2);
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 18/27] Add Book3s_64 offsets to asm-offsets.c
@ 2009-10-21 15:03                                                       ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

We need to access some VCPU fields from assembly code. In order to get
the proper offsets, we have to define them in asm-offsets.c.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kernel/asm-offsets.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 0812b0f..aba3ea6 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -398,6 +398,19 @@ int main(void)
 	DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, arch.last_inst));
 	DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
 	DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
+
+	/* book3s_64 */
+#ifdef CONFIG_PPC64
+	DEFINE(VCPU_FAULT_DSISR, offsetof(struct kvm_vcpu, arch.fault_dsisr));
+	DEFINE(VCPU_HOST_RETIP, offsetof(struct kvm_vcpu, arch.host_retip));
+	DEFINE(VCPU_HOST_R2, offsetof(struct kvm_vcpu, arch.host_r2));
+	DEFINE(VCPU_HOST_MSR, offsetof(struct kvm_vcpu, arch.host_msr));
+	DEFINE(VCPU_SHADOW_MSR, offsetof(struct kvm_vcpu, arch.shadow_msr));
+	DEFINE(VCPU_TRAMPOLINE_LOWMEM, offsetof(struct kvm_vcpu, arch.trampoline_lowmem));
+	DEFINE(VCPU_TRAMPOLINE_ENTER, offsetof(struct kvm_vcpu, arch.trampoline_enter));
+	DEFINE(VCPU_HIGHMEM_HANDLER, offsetof(struct kvm_vcpu, arch.highmem_handler));
+	DEFINE(VCPU_HFLAGS, offsetof(struct kvm_vcpu, arch.hflags));
+#endif
 #endif
 #ifdef CONFIG_44x
 	DEFINE(PGD_T_LOG2, PGD_T_LOG2);
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 19/27] Export symbols for KVM module
       [not found]                                                       ` <1256137413-15256-19-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-21 15:03                                                           ` Alexander Graf
  2009-10-29  2:45                                                           ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

We want to be able to build KVM as a module. To enable us doing so, we
need some more exports from core Linux parts.

This patch exports all functions and variables that are required for KVM.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>

---

v3 -> v4:

  - don't export switch_slb
  - don't export init_context
  - don't export mm_alloc
---
 arch/powerpc/kernel/ppc_ksyms.c |    3 ++-
 arch/powerpc/kernel/time.c      |    1 +
 arch/powerpc/mm/hash_utils_64.c |    2 ++
 3 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
index c8b27bb..baf778c 100644
--- a/arch/powerpc/kernel/ppc_ksyms.c
+++ b/arch/powerpc/kernel/ppc_ksyms.c
@@ -163,11 +163,12 @@ EXPORT_SYMBOL(screen_info);
 #ifdef CONFIG_PPC32
 EXPORT_SYMBOL(timer_interrupt);
 EXPORT_SYMBOL(irq_desc);
-EXPORT_SYMBOL(tb_ticks_per_jiffy);
 EXPORT_SYMBOL(cacheable_memcpy);
 EXPORT_SYMBOL(cacheable_memzero);
 #endif
 
+EXPORT_SYMBOL(tb_ticks_per_jiffy);
+
 #ifdef CONFIG_PPC32
 EXPORT_SYMBOL(switch_mmu_context);
 #endif
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 92dc844..e05f6af 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -268,6 +268,7 @@ void account_system_vtime(struct task_struct *tsk)
 	per_cpu(cputime_scaled_last_delta, smp_processor_id()) = deltascaled;
 	local_irq_restore(flags);
 }
+EXPORT_SYMBOL_GPL(account_system_vtime);
 
 /*
  * Transfer the user and system times accumulated in the paca
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 1ade7eb..2b2a4aa 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -92,6 +92,7 @@ struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
 struct hash_pte *htab_address;
 unsigned long htab_size_bytes;
 unsigned long htab_hash_mask;
+EXPORT_SYMBOL_GPL(htab_hash_mask);
 int mmu_linear_psize = MMU_PAGE_4K;
 int mmu_virtual_psize = MMU_PAGE_4K;
 int mmu_vmalloc_psize = MMU_PAGE_4K;
@@ -102,6 +103,7 @@ int mmu_io_psize = MMU_PAGE_4K;
 int mmu_kernel_ssize = MMU_SEGSIZE_256M;
 int mmu_highuser_ssize = MMU_SEGSIZE_256M;
 u16 mmu_slb_size = 64;
+EXPORT_SYMBOL_GPL(mmu_slb_size);
 #ifdef CONFIG_HUGETLB_PAGE
 unsigned int HPAGE_SHIFT;
 #endif
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 19/27] Export symbols for KVM module
@ 2009-10-21 15:03                                                           ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

We want to be able to build KVM as a module. To enable us doing so, we
need some more exports from core Linux parts.

This patch exports all functions and variables that are required for KVM.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - don't export switch_slb
  - don't export init_context
  - don't export mm_alloc
---
 arch/powerpc/kernel/ppc_ksyms.c |    3 ++-
 arch/powerpc/kernel/time.c      |    1 +
 arch/powerpc/mm/hash_utils_64.c |    2 ++
 3 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
index c8b27bb..baf778c 100644
--- a/arch/powerpc/kernel/ppc_ksyms.c
+++ b/arch/powerpc/kernel/ppc_ksyms.c
@@ -163,11 +163,12 @@ EXPORT_SYMBOL(screen_info);
 #ifdef CONFIG_PPC32
 EXPORT_SYMBOL(timer_interrupt);
 EXPORT_SYMBOL(irq_desc);
-EXPORT_SYMBOL(tb_ticks_per_jiffy);
 EXPORT_SYMBOL(cacheable_memcpy);
 EXPORT_SYMBOL(cacheable_memzero);
 #endif
 
+EXPORT_SYMBOL(tb_ticks_per_jiffy);
+
 #ifdef CONFIG_PPC32
 EXPORT_SYMBOL(switch_mmu_context);
 #endif
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 92dc844..e05f6af 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -268,6 +268,7 @@ void account_system_vtime(struct task_struct *tsk)
 	per_cpu(cputime_scaled_last_delta, smp_processor_id()) = deltascaled;
 	local_irq_restore(flags);
 }
+EXPORT_SYMBOL_GPL(account_system_vtime);
 
 /*
  * Transfer the user and system times accumulated in the paca
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 1ade7eb..2b2a4aa 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -92,6 +92,7 @@ struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
 struct hash_pte *htab_address;
 unsigned long htab_size_bytes;
 unsigned long htab_hash_mask;
+EXPORT_SYMBOL_GPL(htab_hash_mask);
 int mmu_linear_psize = MMU_PAGE_4K;
 int mmu_virtual_psize = MMU_PAGE_4K;
 int mmu_vmalloc_psize = MMU_PAGE_4K;
@@ -102,6 +103,7 @@ int mmu_io_psize = MMU_PAGE_4K;
 int mmu_kernel_ssize = MMU_SEGSIZE_256M;
 int mmu_highuser_ssize = MMU_SEGSIZE_256M;
 u16 mmu_slb_size = 64;
+EXPORT_SYMBOL_GPL(mmu_slb_size);
 #ifdef CONFIG_HUGETLB_PAGE
 unsigned int HPAGE_SHIFT;
 #endif
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 20/27] Split init_new_context and destroy_context
       [not found]                                                           ` <1256137413-15256-20-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-21 15:03                                                               ` Alexander Graf
  2009-10-29  2:46                                                               ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

For KVM we need to allocate a new context id, but don't really care about
all the mm context around it.

So let's split the alloc and destroy functions for the context id, so we can
grab one without allocating an mm context.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
---
 arch/powerpc/include/asm/mmu_context.h |    5 +++++
 arch/powerpc/mm/mmu_context_hash64.c   |   24 +++++++++++++++++++++---
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index b34e94d..66b35d0 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -23,6 +23,11 @@ extern void switch_slb(struct task_struct *tsk, struct mm_struct *mm);
 extern void set_context(unsigned long id, pgd_t *pgd);
 
 #ifdef CONFIG_PPC_BOOK3S_64
+extern int __init_new_context(void);
+extern void __destroy_context(int context_id);
+#endif
+
+#ifdef CONFIG_PPC_BOOK3S_64
 static inline void mmu_context_init(void) { }
 #else
 extern void mmu_context_init(void);
diff --git a/arch/powerpc/mm/mmu_context_hash64.c b/arch/powerpc/mm/mmu_context_hash64.c
index dbeb86a..b9e4cc2 100644
--- a/arch/powerpc/mm/mmu_context_hash64.c
+++ b/arch/powerpc/mm/mmu_context_hash64.c
@@ -18,6 +18,7 @@
 #include <linux/mm.h>
 #include <linux/spinlock.h>
 #include <linux/idr.h>
+#include <linux/module.h>
 
 #include <asm/mmu_context.h>
 
@@ -32,7 +33,7 @@ static DEFINE_IDR(mmu_context_idr);
 #define NO_CONTEXT	0
 #define MAX_CONTEXT	((1UL << 19) - 1)
 
-int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+int __init_new_context(void)
 {
 	int index;
 	int err;
@@ -57,6 +58,18 @@ again:
 		return -ENOMEM;
 	}
 
+	return index;
+}
+EXPORT_SYMBOL_GPL(__init_new_context);
+
+int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+{
+	int index;
+
+	index = __init_new_context();
+	if (index < 0)
+		return index;
+
 	/* The old code would re-promote on fork, we don't do that
 	 * when using slices as it could cause problem promoting slices
 	 * that have been forced down to 4K
@@ -68,11 +81,16 @@ again:
 	return 0;
 }
 
-void destroy_context(struct mm_struct *mm)
+void __destroy_context(int context_id)
 {
 	spin_lock(&mmu_context_lock);
-	idr_remove(&mmu_context_idr, mm->context.id);
+	idr_remove(&mmu_context_idr, context_id);
 	spin_unlock(&mmu_context_lock);
+}
+EXPORT_SYMBOL_GPL(__destroy_context);
 
+void destroy_context(struct mm_struct *mm)
+{
+	__destroy_context(mm->context.id);
 	mm->context.id = NO_CONTEXT;
 }
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 20/27] Split init_new_context and destroy_context
@ 2009-10-21 15:03                                                               ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

For KVM we need to allocate a new context id, but don't really care about
all the mm context around it.

So let's split the alloc and destroy functions for the context id, so we can
grab one without allocating an mm context.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/mmu_context.h |    5 +++++
 arch/powerpc/mm/mmu_context_hash64.c   |   24 +++++++++++++++++++++---
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index b34e94d..66b35d0 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -23,6 +23,11 @@ extern void switch_slb(struct task_struct *tsk, struct mm_struct *mm);
 extern void set_context(unsigned long id, pgd_t *pgd);
 
 #ifdef CONFIG_PPC_BOOK3S_64
+extern int __init_new_context(void);
+extern void __destroy_context(int context_id);
+#endif
+
+#ifdef CONFIG_PPC_BOOK3S_64
 static inline void mmu_context_init(void) { }
 #else
 extern void mmu_context_init(void);
diff --git a/arch/powerpc/mm/mmu_context_hash64.c b/arch/powerpc/mm/mmu_context_hash64.c
index dbeb86a..b9e4cc2 100644
--- a/arch/powerpc/mm/mmu_context_hash64.c
+++ b/arch/powerpc/mm/mmu_context_hash64.c
@@ -18,6 +18,7 @@
 #include <linux/mm.h>
 #include <linux/spinlock.h>
 #include <linux/idr.h>
+#include <linux/module.h>
 
 #include <asm/mmu_context.h>
 
@@ -32,7 +33,7 @@ static DEFINE_IDR(mmu_context_idr);
 #define NO_CONTEXT	0
 #define MAX_CONTEXT	((1UL << 19) - 1)
 
-int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+int __init_new_context(void)
 {
 	int index;
 	int err;
@@ -57,6 +58,18 @@ again:
 		return -ENOMEM;
 	}
 
+	return index;
+}
+EXPORT_SYMBOL_GPL(__init_new_context);
+
+int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+{
+	int index;
+
+	index = __init_new_context();
+	if (index < 0)
+		return index;
+
 	/* The old code would re-promote on fork, we don't do that
 	 * when using slices as it could cause problem promoting slices
 	 * that have been forced down to 4K
@@ -68,11 +81,16 @@ again:
 	return 0;
 }
 
-void destroy_context(struct mm_struct *mm)
+void __destroy_context(int context_id)
 {
 	spin_lock(&mmu_context_lock);
-	idr_remove(&mmu_context_idr, mm->context.id);
+	idr_remove(&mmu_context_idr, context_id);
 	spin_unlock(&mmu_context_lock);
+}
+EXPORT_SYMBOL_GPL(__destroy_context);
 
+void destroy_context(struct mm_struct *mm)
+{
+	__destroy_context(mm->context.id);
 	mm->context.id = NO_CONTEXT;
 }
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 21/27] Export KVM symbols for module
       [not found]                                                               ` <1256137413-15256-21-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-21 15:03                                                                   ` Alexander Graf
  2009-10-29  2:48                                                                   ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

To be able to keep KVM as module, we need to export the SLB trampoline
addresses to the module, so it knows where to jump to.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
---
 arch/powerpc/kvm/book3s_64_exports.c |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_exports.c

diff --git a/arch/powerpc/kvm/book3s_64_exports.c b/arch/powerpc/kvm/book3s_64_exports.c
new file mode 100644
index 0000000..5b2db38
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_exports.c
@@ -0,0 +1,24 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
+ */
+
+#include <linux/module.h>
+#include <asm/kvm_book3s.h>
+
+EXPORT_SYMBOL_GPL(kvmppc_trampoline_enter);
+EXPORT_SYMBOL_GPL(kvmppc_trampoline_lowmem);
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 21/27] Export KVM symbols for module
@ 2009-10-21 15:03                                                                   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

To be able to keep KVM as module, we need to export the SLB trampoline
addresses to the module, so it knows where to jump to.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s_64_exports.c |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_exports.c

diff --git a/arch/powerpc/kvm/book3s_64_exports.c b/arch/powerpc/kvm/book3s_64_exports.c
new file mode 100644
index 0000000..5b2db38
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_exports.c
@@ -0,0 +1,24 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <linux/module.h>
+#include <asm/kvm_book3s.h>
+
+EXPORT_SYMBOL_GPL(kvmppc_trampoline_enter);
+EXPORT_SYMBOL_GPL(kvmppc_trampoline_lowmem);
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 22/27] Add fields to PACA
  2009-10-21 15:03                                                                   ` Alexander Graf
@ 2009-10-21 15:03                                                                     ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

For KVM we need to store some information in the PACA, so we
need to extend it.

This patch adds KVM SLB shadow related entries to the PACA and
a field that indicates if we're inside a guest.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/paca.h |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 7d8514c..5e9b4ef 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -129,6 +129,15 @@ struct paca_struct {
 	u64 system_time;		/* accumulated system TB ticks */
 	u64 startpurr;			/* PURR/TB value snapshot */
 	u64 startspurr;			/* SPURR value snapshot */
+
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+	struct  {
+		u64     esid;
+		u64     vsid;
+	} kvm_slb[64];			/* guest SLB */
+	u8 kvm_slb_max;			/* highest used guest slb entry */
+	u8 kvm_in_guest;		/* are we inside the guest? */
+#endif
 };
 
 extern struct paca_struct paca[];
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 22/27] Add fields to PACA
@ 2009-10-21 15:03                                                                     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

For KVM we need to store some information in the PACA, so we
need to extend it.

This patch adds KVM SLB shadow related entries to the PACA and
a field that indicates if we're inside a guest.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/paca.h |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 7d8514c..5e9b4ef 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -129,6 +129,15 @@ struct paca_struct {
 	u64 system_time;		/* accumulated system TB ticks */
 	u64 startpurr;			/* PURR/TB value snapshot */
 	u64 startspurr;			/* SPURR value snapshot */
+
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+	struct  {
+		u64     esid;
+		u64     vsid;
+	} kvm_slb[64];			/* guest SLB */
+	u8 kvm_slb_max;			/* highest used guest slb entry */
+	u8 kvm_in_guest;		/* are we inside the guest? */
+#endif
 };
 
 extern struct paca_struct paca[];
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 23/27] Export new PACA constants in asm-offsets
       [not found]                                                                     ` <1256137413-15256-23-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-21 15:03                                                                         ` Alexander Graf
  2009-10-29  2:50                                                                         ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

In order to access fields in the PACA from assembly code, we need
to generate offsets using asm-offsets.c.

So let's add the new PACA related bits, we just introduced!

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
---
 arch/powerpc/kernel/asm-offsets.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index aba3ea6..e2e2082 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -190,6 +190,11 @@ int main(void)
 	DEFINE(PACA_SYSTEM_TIME, offsetof(struct paca_struct, system_time));
 	DEFINE(PACA_DATA_OFFSET, offsetof(struct paca_struct, data_offset));
 	DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save));
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+	DEFINE(PACA_KVM_IN_GUEST, offsetof(struct paca_struct, kvm_in_guest));
+	DEFINE(PACA_KVM_SLB, offsetof(struct paca_struct, kvm_slb));
+	DEFINE(PACA_KVM_SLB_MAX, offsetof(struct paca_struct, kvm_slb_max));
+#endif
 #endif /* CONFIG_PPC64 */
 
 	/* RTAS */
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 23/27] Export new PACA constants in asm-offsets
@ 2009-10-21 15:03                                                                         ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

In order to access fields in the PACA from assembly code, we need
to generate offsets using asm-offsets.c.

So let's add the new PACA related bits, we just introduced!

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kernel/asm-offsets.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index aba3ea6..e2e2082 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -190,6 +190,11 @@ int main(void)
 	DEFINE(PACA_SYSTEM_TIME, offsetof(struct paca_struct, system_time));
 	DEFINE(PACA_DATA_OFFSET, offsetof(struct paca_struct, data_offset));
 	DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save));
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+	DEFINE(PACA_KVM_IN_GUEST, offsetof(struct paca_struct, kvm_in_guest));
+	DEFINE(PACA_KVM_SLB, offsetof(struct paca_struct, kvm_slb));
+	DEFINE(PACA_KVM_SLB_MAX, offsetof(struct paca_struct, kvm_slb_max));
+#endif
 #endif /* CONFIG_PPC64 */
 
 	/* RTAS */
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 24/27] Include Book3s_64 target in buildsystem
  2009-10-21 15:03                                                                         ` Alexander Graf
@ 2009-10-21 15:03                                                                           ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

Now we have everything in place to be able to build KVM, so let's add it
as config option and in the Makefile.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/Kconfig  |   17 +++++++++++++++++
 arch/powerpc/kvm/Makefile |   27 +++++++++++++++++++++++----
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index c299268..07703f7 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -21,6 +21,23 @@ config KVM
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES
 
+config KVM_BOOK3S_64_HANDLER
+	bool
+
+config KVM_BOOK3S_64
+	tristate "KVM support for PowerPC book3s_64 processors"
+	depends on EXPERIMENTAL && PPC64
+	select KVM
+	select KVM_BOOK3S_64_HANDLER
+	---help---
+	  Support running unmodified book3s_64 and book3s_32 guest kernels
+	  in virtual machines on book3s_64 host processors.
+
+	  This module provides access to the hardware capabilities through
+	  a character device node named /dev/kvm.
+
+	  If unsure, say N.
+
 config KVM_440
 	bool "KVM support for PowerPC 440 processors"
 	depends on EXPERIMENTAL && 44x
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 37655fe..56484d6 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -12,26 +12,45 @@ CFLAGS_44x_tlb.o  := -I.
 CFLAGS_e500_tlb.o := -I.
 CFLAGS_emulate.o  := -I.
 
-kvm-objs := $(common-objs-y) powerpc.o emulate.o
+common-objs-y += powerpc.o emulate.o
 obj-$(CONFIG_KVM_EXIT_TIMING) += timing.o
-obj-$(CONFIG_KVM) += kvm.o
+obj-$(CONFIG_KVM_BOOK3S_64_HANDLER) += book3s_64_exports.o
 
 AFLAGS_booke_interrupts.o := -I$(obj)
 
 kvm-440-objs := \
+	$(common-objs-y) \
 	booke.o \
 	booke_emulate.o \
 	booke_interrupts.o \
 	44x.o \
 	44x_tlb.o \
 	44x_emulate.o
-obj-$(CONFIG_KVM_440) += kvm-440.o
+kvm-objs-$(CONFIG_KVM_440) := $(kvm-440-objs)
 
 kvm-e500-objs := \
+	$(common-objs-y) \
 	booke.o \
 	booke_emulate.o \
 	booke_interrupts.o \
 	e500.o \
 	e500_tlb.o \
 	e500_emulate.o
-obj-$(CONFIG_KVM_E500) += kvm-e500.o
+kvm-objs-$(CONFIG_KVM_E500) := $(kvm-e500-objs)
+
+kvm-book3s_64-objs := \
+	$(common-objs-y) \
+	book3s.o \
+	book3s_64_emulate.o \
+	book3s_64_interrupts.o \
+	book3s_64_mmu_host.o \
+	book3s_64_mmu.o \
+	book3s_32_mmu.o
+kvm-objs-$(CONFIG_KVM_BOOK3S_64) := $(kvm-book3s_64-objs)
+
+kvm-objs := $(kvm-objs-m) $(kvm-objs-y)
+
+obj-$(CONFIG_KVM_440) += kvm.o
+obj-$(CONFIG_KVM_E500) += kvm.o
+obj-$(CONFIG_KVM_BOOK3S_64) += kvm.o
+
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 24/27] Include Book3s_64 target in buildsystem
@ 2009-10-21 15:03                                                                           ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

Now we have everything in place to be able to build KVM, so let's add it
as config option and in the Makefile.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/Kconfig  |   17 +++++++++++++++++
 arch/powerpc/kvm/Makefile |   27 +++++++++++++++++++++++----
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index c299268..07703f7 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -21,6 +21,23 @@ config KVM
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES
 
+config KVM_BOOK3S_64_HANDLER
+	bool
+
+config KVM_BOOK3S_64
+	tristate "KVM support for PowerPC book3s_64 processors"
+	depends on EXPERIMENTAL && PPC64
+	select KVM
+	select KVM_BOOK3S_64_HANDLER
+	---help---
+	  Support running unmodified book3s_64 and book3s_32 guest kernels
+	  in virtual machines on book3s_64 host processors.
+
+	  This module provides access to the hardware capabilities through
+	  a character device node named /dev/kvm.
+
+	  If unsure, say N.
+
 config KVM_440
 	bool "KVM support for PowerPC 440 processors"
 	depends on EXPERIMENTAL && 44x
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 37655fe..56484d6 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -12,26 +12,45 @@ CFLAGS_44x_tlb.o  := -I.
 CFLAGS_e500_tlb.o := -I.
 CFLAGS_emulate.o  := -I.
 
-kvm-objs := $(common-objs-y) powerpc.o emulate.o
+common-objs-y += powerpc.o emulate.o
 obj-$(CONFIG_KVM_EXIT_TIMING) += timing.o
-obj-$(CONFIG_KVM) += kvm.o
+obj-$(CONFIG_KVM_BOOK3S_64_HANDLER) += book3s_64_exports.o
 
 AFLAGS_booke_interrupts.o := -I$(obj)
 
 kvm-440-objs := \
+	$(common-objs-y) \
 	booke.o \
 	booke_emulate.o \
 	booke_interrupts.o \
 	44x.o \
 	44x_tlb.o \
 	44x_emulate.o
-obj-$(CONFIG_KVM_440) += kvm-440.o
+kvm-objs-$(CONFIG_KVM_440) := $(kvm-440-objs)
 
 kvm-e500-objs := \
+	$(common-objs-y) \
 	booke.o \
 	booke_emulate.o \
 	booke_interrupts.o \
 	e500.o \
 	e500_tlb.o \
 	e500_emulate.o
-obj-$(CONFIG_KVM_E500) += kvm-e500.o
+kvm-objs-$(CONFIG_KVM_E500) := $(kvm-e500-objs)
+
+kvm-book3s_64-objs := \
+	$(common-objs-y) \
+	book3s.o \
+	book3s_64_emulate.o \
+	book3s_64_interrupts.o \
+	book3s_64_mmu_host.o \
+	book3s_64_mmu.o \
+	book3s_32_mmu.o
+kvm-objs-$(CONFIG_KVM_BOOK3S_64) := $(kvm-book3s_64-objs)
+
+kvm-objs := $(kvm-objs-m) $(kvm-objs-y)
+
+obj-$(CONFIG_KVM_440) += kvm.o
+obj-$(CONFIG_KVM_E500) += kvm.o
+obj-$(CONFIG_KVM_BOOK3S_64) += kvm.o
+
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 25/27] Fix trace.h
       [not found]                                                                           ` <1256137413-15256-25-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-21 15:03                                                                               ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

It looks like the variable "pc" is defined. At least the current code always
failed on me stating that "pc" is already defined somewhere else.

Let's use _pc instead, because that doesn't collide.

Is this the right approach? Does it break on 440 too? If not, why not?

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
---
 arch/powerpc/kvm/trace.h |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
index 67f219d..a8e8400 100644
--- a/arch/powerpc/kvm/trace.h
+++ b/arch/powerpc/kvm/trace.h
@@ -12,8 +12,8 @@
  * Tracepoint for guest mode entry.
  */
 TRACE_EVENT(kvm_ppc_instr,
-	TP_PROTO(unsigned int inst, unsigned long pc, unsigned int emulate),
-	TP_ARGS(inst, pc, emulate),
+	TP_PROTO(unsigned int inst, unsigned long _pc, unsigned int emulate),
+	TP_ARGS(inst, _pc, emulate),
 
 	TP_STRUCT__entry(
 		__field(	unsigned int,	inst		)
@@ -23,7 +23,7 @@ TRACE_EVENT(kvm_ppc_instr,
 
 	TP_fast_assign(
 		__entry->inst		= inst;
-		__entry->pc		= pc;
+		__entry->pc		= _pc;
 		__entry->emulate	= emulate;
 	),
 
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 25/27] Fix trace.h
@ 2009-10-21 15:03                                                                               ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

It looks like the variable "pc" is defined. At least the current code always
failed on me stating that "pc" is already defined somewhere else.

Let's use _pc instead, because that doesn't collide.

Is this the right approach? Does it break on 440 too? If not, why not?

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/trace.h |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
index 67f219d..a8e8400 100644
--- a/arch/powerpc/kvm/trace.h
+++ b/arch/powerpc/kvm/trace.h
@@ -12,8 +12,8 @@
  * Tracepoint for guest mode entry.
  */
 TRACE_EVENT(kvm_ppc_instr,
-	TP_PROTO(unsigned int inst, unsigned long pc, unsigned int emulate),
-	TP_ARGS(inst, pc, emulate),
+	TP_PROTO(unsigned int inst, unsigned long _pc, unsigned int emulate),
+	TP_ARGS(inst, _pc, emulate),
 
 	TP_STRUCT__entry(
 		__field(	unsigned int,	inst		)
@@ -23,7 +23,7 @@ TRACE_EVENT(kvm_ppc_instr,
 
 	TP_fast_assign(
 		__entry->inst		= inst;
-		__entry->pc		= pc;
+		__entry->pc		= _pc;
 		__entry->emulate	= emulate;
 	),
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 26/27] Use Little Endian for Dirty Bitmap
  2009-10-21 15:03                                                                               ` Alexander Graf
@ 2009-10-21 15:03                                                                                 ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

We currently use host endian long types to store information
in the dirty bitmap.

This works reasonably well on Little Endian targets, because the
u32 after the first contains the next 32 bits. On Big Endian this
breaks completely though, forcing us to be inventive here.

So Ben suggested to always use Little Endian, which looks reasonable.

We only have dirty bitmap implemented in Little Endian targets so far
and since PowerPC would be the first Big Endian platform, we can just
as well switch to Little Endian always with little effort without
breaking existing targets.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 virt/kvm/kvm_main.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 54a272f..c565e5b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -49,6 +49,7 @@
 #include <asm/io.h>
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
+#include <asm-generic/bitops/le.h>
 
 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
 #include "coalesced_mmio.h"
@@ -1071,8 +1072,8 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
 
 		/* avoid RMW */
-		if (!test_bit(rel_gfn, memslot->dirty_bitmap))
-			set_bit(rel_gfn, memslot->dirty_bitmap);
+		if (!generic_test_le_bit(rel_gfn, memslot->dirty_bitmap))
+			generic___set_le_bit(rel_gfn, memslot->dirty_bitmap);
 	}
 }
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 26/27] Use Little Endian for Dirty Bitmap
@ 2009-10-21 15:03                                                                                 ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

We currently use host endian long types to store information
in the dirty bitmap.

This works reasonably well on Little Endian targets, because the
u32 after the first contains the next 32 bits. On Big Endian this
breaks completely though, forcing us to be inventive here.

So Ben suggested to always use Little Endian, which looks reasonable.

We only have dirty bitmap implemented in Little Endian targets so far
and since PowerPC would be the first Big Endian platform, we can just
as well switch to Little Endian always with little effort without
breaking existing targets.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 virt/kvm/kvm_main.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 54a272f..c565e5b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -49,6 +49,7 @@
 #include <asm/io.h>
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
+#include <asm-generic/bitops/le.h>
 
 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
 #include "coalesced_mmio.h"
@@ -1071,8 +1072,8 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
 
 		/* avoid RMW */
-		if (!test_bit(rel_gfn, memslot->dirty_bitmap))
-			set_bit(rel_gfn, memslot->dirty_bitmap);
+		if (!generic_test_le_bit(rel_gfn, memslot->dirty_bitmap))
+			generic___set_le_bit(rel_gfn, memslot->dirty_bitmap);
 	}
 }
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 27/27] Use hrtimers for the decrementer
       [not found]                                                                                 ` <1256137413-15256-27-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-21 15:03                                                                                     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

Following S390's good example we should use hrtimers for the decrementer too!
This patch converts the timer from the old mechanism to hrtimers.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
---
 arch/powerpc/include/asm/kvm_host.h |    6 ++++--
 arch/powerpc/kvm/emulate.c          |   18 +++++++++++-------
 arch/powerpc/kvm/powerpc.c          |   20 ++++++++++++++++++--
 3 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 2cff5fe..1201f62 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -21,7 +21,8 @@
 #define __POWERPC_KVM_HOST_H__
 
 #include <linux/mutex.h>
-#include <linux/timer.h>
+#include <linux/hrtimer.h>
+#include <linux/interrupt.h>
 #include <linux/types.h>
 #include <linux/kvm_types.h>
 #include <asm/kvm_asm.h>
@@ -250,7 +251,8 @@ struct kvm_vcpu_arch {
 
 	u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */
 
-	struct timer_list dec_timer;
+	struct hrtimer dec_timer;
+	struct tasklet_struct tasklet;
 	u64 dec_jiffies;
 	unsigned long pending_exceptions;
 
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 1ec5e07..4a9ac66 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -18,7 +18,7 @@
  */
 
 #include <linux/jiffies.h>
-#include <linux/timer.h>
+#include <linux/hrtimer.h>
 #include <linux/types.h>
 #include <linux/string.h>
 #include <linux/kvm_host.h>
@@ -79,12 +79,13 @@ static int kvmppc_dec_enabled(struct kvm_vcpu *vcpu)
 
 void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 {
-	unsigned long nr_jiffies;
+	unsigned long dec_nsec;
 
+	pr_debug("mtDEC: %x\n", vcpu->arch.dec);
 #ifdef CONFIG_PPC64
 	/* POWER4+ triggers a dec interrupt if the value is < 0 */
 	if (vcpu->arch.dec & 0x80000000) {
-		del_timer(&vcpu->arch.dec_timer);
+		hrtimer_try_to_cancel(&vcpu->arch.dec_timer);
 		kvmppc_core_queue_dec(vcpu);
 		return;
 	}
@@ -94,12 +95,15 @@ void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 		 * that's how we convert the guest DEC value to the number of
 		 * host ticks. */
 
+		hrtimer_try_to_cancel(&vcpu->arch.dec_timer);
+		dec_nsec = vcpu->arch.dec;
+		dec_nsec *= 1000;
+		dec_nsec /= tb_ticks_per_usec;
+		hrtimer_start(&vcpu->arch.dec_timer, ktime_set(0, dec_nsec),
+			      HRTIMER_MODE_REL);
 		vcpu->arch.dec_jiffies = get_tb();
-		nr_jiffies = vcpu->arch.dec / tb_ticks_per_jiffy;
-		mod_timer(&vcpu->arch.dec_timer,
-		          get_jiffies_64() + nr_jiffies);
 	} else {
-		del_timer(&vcpu->arch.dec_timer);
+		hrtimer_try_to_cancel(&vcpu->arch.dec_timer);
 	}
 }
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 4ae3490..4c582ed 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -23,6 +23,7 @@
 #include <linux/kvm_host.h>
 #include <linux/module.h>
 #include <linux/vmalloc.h>
+#include <linux/hrtimer.h>
 #include <linux/fs.h>
 #include <asm/cputable.h>
 #include <asm/uaccess.h>
@@ -209,10 +210,25 @@ static void kvmppc_decrementer_func(unsigned long data)
 	}
 }
 
+/*
+ * low level hrtimer wake routine. Because this runs in hardirq context
+ * we schedule a tasklet to do the real work.
+ */
+enum hrtimer_restart kvmppc_decrementer_wakeup(struct hrtimer *timer)
+{
+	struct kvm_vcpu *vcpu;
+
+	vcpu = container_of(timer, struct kvm_vcpu, arch.dec_timer);
+	tasklet_schedule(&vcpu->arch.tasklet);
+
+	return HRTIMER_NORESTART;
+}
+
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
-	setup_timer(&vcpu->arch.dec_timer, kvmppc_decrementer_func,
-	            (unsigned long)vcpu);
+	hrtimer_init(&vcpu->arch.dec_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
+	tasklet_init(&vcpu->arch.tasklet, kvmppc_decrementer_func, (ulong)vcpu);
+	vcpu->arch.dec_timer.function = kvmppc_decrementer_wakeup;
 
 	return 0;
 }
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 27/27] Use hrtimers for the decrementer
@ 2009-10-21 15:03                                                                                     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:03 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

Following S390's good example we should use hrtimers for the decrementer too!
This patch converts the timer from the old mechanism to hrtimers.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/kvm_host.h |    6 ++++--
 arch/powerpc/kvm/emulate.c          |   18 +++++++++++-------
 arch/powerpc/kvm/powerpc.c          |   20 ++++++++++++++++++--
 3 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 2cff5fe..1201f62 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -21,7 +21,8 @@
 #define __POWERPC_KVM_HOST_H__
 
 #include <linux/mutex.h>
-#include <linux/timer.h>
+#include <linux/hrtimer.h>
+#include <linux/interrupt.h>
 #include <linux/types.h>
 #include <linux/kvm_types.h>
 #include <asm/kvm_asm.h>
@@ -250,7 +251,8 @@ struct kvm_vcpu_arch {
 
 	u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */
 
-	struct timer_list dec_timer;
+	struct hrtimer dec_timer;
+	struct tasklet_struct tasklet;
 	u64 dec_jiffies;
 	unsigned long pending_exceptions;
 
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 1ec5e07..4a9ac66 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -18,7 +18,7 @@
  */
 
 #include <linux/jiffies.h>
-#include <linux/timer.h>
+#include <linux/hrtimer.h>
 #include <linux/types.h>
 #include <linux/string.h>
 #include <linux/kvm_host.h>
@@ -79,12 +79,13 @@ static int kvmppc_dec_enabled(struct kvm_vcpu *vcpu)
 
 void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 {
-	unsigned long nr_jiffies;
+	unsigned long dec_nsec;
 
+	pr_debug("mtDEC: %x\n", vcpu->arch.dec);
 #ifdef CONFIG_PPC64
 	/* POWER4+ triggers a dec interrupt if the value is < 0 */
 	if (vcpu->arch.dec & 0x80000000) {
-		del_timer(&vcpu->arch.dec_timer);
+		hrtimer_try_to_cancel(&vcpu->arch.dec_timer);
 		kvmppc_core_queue_dec(vcpu);
 		return;
 	}
@@ -94,12 +95,15 @@ void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 		 * that's how we convert the guest DEC value to the number of
 		 * host ticks. */
 
+		hrtimer_try_to_cancel(&vcpu->arch.dec_timer);
+		dec_nsec = vcpu->arch.dec;
+		dec_nsec *= 1000;
+		dec_nsec /= tb_ticks_per_usec;
+		hrtimer_start(&vcpu->arch.dec_timer, ktime_set(0, dec_nsec),
+			      HRTIMER_MODE_REL);
 		vcpu->arch.dec_jiffies = get_tb();
-		nr_jiffies = vcpu->arch.dec / tb_ticks_per_jiffy;
-		mod_timer(&vcpu->arch.dec_timer,
-		          get_jiffies_64() + nr_jiffies);
 	} else {
-		del_timer(&vcpu->arch.dec_timer);
+		hrtimer_try_to_cancel(&vcpu->arch.dec_timer);
 	}
 }
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 4ae3490..4c582ed 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -23,6 +23,7 @@
 #include <linux/kvm_host.h>
 #include <linux/module.h>
 #include <linux/vmalloc.h>
+#include <linux/hrtimer.h>
 #include <linux/fs.h>
 #include <asm/cputable.h>
 #include <asm/uaccess.h>
@@ -209,10 +210,25 @@ static void kvmppc_decrementer_func(unsigned long data)
 	}
 }
 
+/*
+ * low level hrtimer wake routine. Because this runs in hardirq context
+ * we schedule a tasklet to do the real work.
+ */
+enum hrtimer_restart kvmppc_decrementer_wakeup(struct hrtimer *timer)
+{
+	struct kvm_vcpu *vcpu;
+
+	vcpu = container_of(timer, struct kvm_vcpu, arch.dec_timer);
+	tasklet_schedule(&vcpu->arch.tasklet);
+
+	return HRTIMER_NORESTART;
+}
+
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
-	setup_timer(&vcpu->arch.dec_timer, kvmppc_decrementer_func,
-	            (unsigned long)vcpu);
+	hrtimer_init(&vcpu->arch.dec_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
+	tasklet_init(&vcpu->arch.tasklet, kvmppc_decrementer_func, (ulong)vcpu);
+	vcpu->arch.dec_timer.function = kvmppc_decrementer_wakeup;
 
 	return 0;
 }
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
  2009-10-21 15:03 ` Alexander Graf
@ 2009-10-21 15:22   ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:22 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm, Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti


On 21.10.2009, at 17:03, Alexander Graf wrote:

> KVM for PowerPC only supports embedded cores at the moment.
>
> While it makes sense to virtualize on small machines, it's even more  
> fun
> to do so on big boxes. So I figured we need KVM for PowerPC64 as well.
>
> This patchset implements KVM support for Book3s_64 hosts and guest  
> support
> for Book3s_64 and G3/G4.
>
> To really make use of this, you also need a recent version of qemu.
>
>
> Don't want to apply patches? Get the git tree!
>
> $ git clone git://csgraf.de/kvm
> $ git checkout origin/ppc-v4

ppc-v5 of course. Though I'm still trying to take git to actually  
serve the correct tree - sigh.

Alex


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
@ 2009-10-21 15:22   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-21 15:22 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm, Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti


On 21.10.2009, at 17:03, Alexander Graf wrote:

> KVM for PowerPC only supports embedded cores at the moment.
>
> While it makes sense to virtualize on small machines, it's even more  
> fun
> to do so on big boxes. So I figured we need KVM for PowerPC64 as well.
>
> This patchset implements KVM support for Book3s_64 hosts and guest  
> support
> for Book3s_64 and G3/G4.
>
> To really make use of this, you also need a recent version of qemu.
>
>
> Don't want to apply patches? Get the git tree!
>
> $ git clone git://csgraf.de/kvm
> $ git checkout origin/ppc-v4

ppc-v5 of course. Though I'm still trying to take git to actually  
serve the correct tree - sigh.

Alex


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
       [not found] ` <1256137413-15256-1-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-22 13:26     ` Arnd Bergmann
  2009-10-23  0:33     ` Hollis Blanchard
  1 sibling, 0 replies; 244+ messages in thread
From: Arnd Bergmann @ 2009-10-22 13:26 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-ppc,
	Hollis Blanchard, Benjamin Herrenschmidt, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On Wednesday 21 October 2009, Alexander Graf wrote:
> 
> KVM for PowerPC only supports embedded cores at the moment.
> 
> While it makes sense to virtualize on small machines, it's even more fun
> to do so on big boxes. So I figured we need KVM for PowerPC64 as well.
> 
> This patchset implements KVM support for Book3s_64 hosts and guest support
> for Book3s_64 and G3/G4.
> 
> To really make use of this, you also need a recent version of qemu.
> 
> 
> Don't want to apply patches? Get the git tree!
> 
> $ git clone git://csgraf.de/kvm
> $ git checkout origin/ppc-v4

Whole series Acked-by: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>

Great work, Alex!

	Arnd <><

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
@ 2009-10-22 13:26     ` Arnd Bergmann
  0 siblings, 0 replies; 244+ messages in thread
From: Arnd Bergmann @ 2009-10-22 13:26 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-ppc,
	Hollis Blanchard, Benjamin Herrenschmidt, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On Wednesday 21 October 2009, Alexander Graf wrote:
> 
> KVM for PowerPC only supports embedded cores at the moment.
> 
> While it makes sense to virtualize on small machines, it's even more fun
> to do so on big boxes. So I figured we need KVM for PowerPC64 as well.
> 
> This patchset implements KVM support for Book3s_64 hosts and guest support
> for Book3s_64 and G3/G4.
> 
> To really make use of this, you also need a recent version of qemu.
> 
> 
> Don't want to apply patches? Get the git tree!
> 
> $ git clone git://csgraf.de/kvm
> $ git checkout origin/ppc-v4

Whole series Acked-by: Arnd Bergmann <arnd@arndb.de>

Great work, Alex!

	Arnd <><

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
       [not found] ` <1256137413-15256-1-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-23  0:33     ` Hollis Blanchard
  2009-10-23  0:33     ` Hollis Blanchard
  1 sibling, 0 replies; 244+ messages in thread
From: Hollis Blanchard @ 2009-10-23  0:33 UTC (permalink / raw)
  To: Alexander Graf, Avi Kivity
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, kvm-ppc, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
> KVM for PowerPC only supports embedded cores at the moment.
> 
> While it makes sense to virtualize on small machines, it's even more fun
> to do so on big boxes. So I figured we need KVM for PowerPC64 as well.
> 
> This patchset implements KVM support for Book3s_64 hosts and guest support
> for Book3s_64 and G3/G4.

Acked-by: Hollis Blanchard <hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

Avi, please apply these patches, and one more (unrelated) to fix the
Book E build that I will send in just a moment.

-- 
Hollis Blanchard
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
@ 2009-10-23  0:33     ` Hollis Blanchard
  0 siblings, 0 replies; 244+ messages in thread
From: Hollis Blanchard @ 2009-10-23  0:33 UTC (permalink / raw)
  To: Alexander Graf, Avi Kivity
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, kvm-ppc, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
> KVM for PowerPC only supports embedded cores at the moment.
> 
> While it makes sense to virtualize on small machines, it's even more fun
> to do so on big boxes. So I figured we need KVM for PowerPC64 as well.
> 
> This patchset implements KVM support for Book3s_64 hosts and guest support
> for Book3s_64 and G3/G4.

Acked-by: Hollis Blanchard <hollisb@us.ibm.com>

Avi, please apply these patches, and one more (unrelated) to fix the
Book E build that I will send in just a moment.

-- 
Hollis Blanchard
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
       [not found]     ` <1256258028.7495.34.camel-6XWu2dSDoRTcKpUcGLbliUEOCMrvLtNR@public.gmane.org>
@ 2009-10-25 13:01         ` Avi Kivity
  0 siblings, 0 replies; 244+ messages in thread
From: Avi Kivity @ 2009-10-25 13:01 UTC (permalink / raw)
  To: Hollis Blanchard
  Cc: Alexander Graf, kvm-u79uwXL29TY76Z2rM5mHXA, kvm-ppc,
	Arnd Bergmann, Benjamin Herrenschmidt, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On 10/23/2009 02:33 AM, Hollis Blanchard wrote:
> On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
>    
>> KVM for PowerPC only supports embedded cores at the moment.
>>
>> While it makes sense to virtualize on small machines, it's even more fun
>> to do so on big boxes. So I figured we need KVM for PowerPC64 as well.
>>
>> This patchset implements KVM support for Book3s_64 hosts and guest support
>> for Book3s_64 and G3/G4.
>>      
> Acked-by: Hollis Blanchard<hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
>
> Avi, please apply these patches
>    

I still need acks for the arch/powerpc/{kernel,mm} bits, simple as they 
are, from the powerpc maintainers.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
@ 2009-10-25 13:01         ` Avi Kivity
  0 siblings, 0 replies; 244+ messages in thread
From: Avi Kivity @ 2009-10-25 13:01 UTC (permalink / raw)
  To: Hollis Blanchard
  Cc: Alexander Graf, kvm-u79uwXL29TY76Z2rM5mHXA, kvm-ppc,
	Arnd Bergmann, Benjamin Herrenschmidt, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On 10/23/2009 02:33 AM, Hollis Blanchard wrote:
> On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
>    
>> KVM for PowerPC only supports embedded cores at the moment.
>>
>> While it makes sense to virtualize on small machines, it's even more fun
>> to do so on big boxes. So I figured we need KVM for PowerPC64 as well.
>>
>> This patchset implements KVM support for Book3s_64 hosts and guest support
>> for Book3s_64 and G3/G4.
>>      
> Acked-by: Hollis Blanchard<hollisb@us.ibm.com>
>
> Avi, please apply these patches
>    

I still need acks for the arch/powerpc/{kernel,mm} bits, simple as they 
are, from the powerpc maintainers.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
       [not found]         ` <4AE44C14.8040507-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2009-10-26 21:18             ` Hollis Blanchard
  2009-10-29  2:55             ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 244+ messages in thread
From: Hollis Blanchard @ 2009-10-26 21:18 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alexander Graf, kvm-u79uwXL29TY76Z2rM5mHXA, kvm-ppc,
	Arnd Bergmann, Benjamin Herrenschmidt, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On Sun, 2009-10-25 at 15:01 +0200, Avi Kivity wrote:
> On 10/23/2009 02:33 AM, Hollis Blanchard wrote:
> > On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
> >    
> >> KVM for PowerPC only supports embedded cores at the moment.
> >>
> >> While it makes sense to virtualize on small machines, it's even more fun
> >> to do so on big boxes. So I figured we need KVM for PowerPC64 as well.
> >>
> >> This patchset implements KVM support for Book3s_64 hosts and guest support
> >> for Book3s_64 and G3/G4.
> >>      
> > Acked-by: Hollis Blanchard<hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> >
> > Avi, please apply these patches
> >    
> 
> I still need acks for the arch/powerpc/{kernel,mm} bits, simple as they 
> are, from the powerpc maintainers.

OK, BenH says they're on his todo list.

In the meantime, please apply patch #2, because it fixes the broken qemu
build.

-- 
Hollis Blanchard
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
@ 2009-10-26 21:18             ` Hollis Blanchard
  0 siblings, 0 replies; 244+ messages in thread
From: Hollis Blanchard @ 2009-10-26 21:18 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alexander Graf, kvm-u79uwXL29TY76Z2rM5mHXA, kvm-ppc,
	Arnd Bergmann, Benjamin Herrenschmidt, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On Sun, 2009-10-25 at 15:01 +0200, Avi Kivity wrote:
> On 10/23/2009 02:33 AM, Hollis Blanchard wrote:
> > On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
> >    
> >> KVM for PowerPC only supports embedded cores at the moment.
> >>
> >> While it makes sense to virtualize on small machines, it's even more fun
> >> to do so on big boxes. So I figured we need KVM for PowerPC64 as well.
> >>
> >> This patchset implements KVM support for Book3s_64 hosts and guest support
> >> for Book3s_64 and G3/G4.
> >>      
> > Acked-by: Hollis Blanchard<hollisb@us.ibm.com>
> >
> > Avi, please apply these patches
> >    
> 
> I still need acks for the arch/powerpc/{kernel,mm} bits, simple as they 
> are, from the powerpc maintainers.

OK, BenH says they're on his todo list.

In the meantime, please apply patch #2, because it fixes the broken qemu
build.

-- 
Hollis Blanchard
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
  2009-10-21 15:03 ` Alexander Graf
@ 2009-10-26 23:06   ` Olof Johansson
  -1 siblings, 0 replies; 244+ messages in thread
From: Olof Johansson @ 2009-10-26 22:46 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm, Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

Not sure which patch in the series this is needed for since I applied
them all, but I got:

  CC      arch/powerpc/kvm/timing.o
arch/powerpc/kvm/timing.c:205: error: 'THIS_MODULE' undeclared here (not in a function)


Signed-off-by: Olof Johansson <olof@lixom.net>


diff --git a/arch/powerpc/kvm/timing.c b/arch/powerpc/kvm/timing.c
index 2aa371e..7037855 100644
--- a/arch/powerpc/kvm/timing.c
+++ b/arch/powerpc/kvm/timing.c
@@ -23,6 +23,7 @@
 #include <linux/seq_file.h>
 #include <linux/debugfs.h>
 #include <linux/uaccess.h>
+#include <linux/module.h>
 
 #include <asm/time.h>
 #include <asm-generic/div64.h>

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
@ 2009-10-26 23:06   ` Olof Johansson
  0 siblings, 0 replies; 244+ messages in thread
From: Olof Johansson @ 2009-10-26 23:06 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm, Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

Not sure which patch in the series this is needed for since I applied
them all, but I got:

  CC      arch/powerpc/kvm/timing.o
arch/powerpc/kvm/timing.c:205: error: 'THIS_MODULE' undeclared here (not in a function)


Signed-off-by: Olof Johansson <olof@lixom.net>


diff --git a/arch/powerpc/kvm/timing.c b/arch/powerpc/kvm/timing.c
index 2aa371e..7037855 100644
--- a/arch/powerpc/kvm/timing.c
+++ b/arch/powerpc/kvm/timing.c
@@ -23,6 +23,7 @@
 #include <linux/seq_file.h>
 #include <linux/debugfs.h>
 #include <linux/uaccess.h>
+#include <linux/module.h>
 
 #include <asm/time.h>
 #include <asm-generic/div64.h>

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
       [not found]   ` <20091026230632.GB5366-nZhT3qVonbNeoWH0uzbU5w@public.gmane.org>
@ 2009-10-26 23:20       ` Hollis Blanchard
  0 siblings, 0 replies; 244+ messages in thread
From: Hollis Blanchard @ 2009-10-26 23:20 UTC (permalink / raw)
  To: Olof Johansson
  Cc: Alexander Graf, kvm-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-ppc,
	Arnd Bergmann, Benjamin Herrenschmidt, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On Mon, 2009-10-26 at 18:06 -0500, Olof Johansson wrote:
> Not sure which patch in the series this is needed for since I applied
> them all, but I got:
> 
>   CC      arch/powerpc/kvm/timing.o
> arch/powerpc/kvm/timing.c:205: error: 'THIS_MODULE' undeclared here (not in a function)
> 
> 
> Signed-off-by: Olof Johansson <olof-nZhT3qVonbNeoWH0uzbU5w@public.gmane.org>
> 
> 
> diff --git a/arch/powerpc/kvm/timing.c b/arch/powerpc/kvm/timing.c
> index 2aa371e..7037855 100644
> --- a/arch/powerpc/kvm/timing.c
> +++ b/arch/powerpc/kvm/timing.c
> @@ -23,6 +23,7 @@
>  #include <linux/seq_file.h>
>  #include <linux/debugfs.h>
>  #include <linux/uaccess.h>
> +#include <linux/module.h>
> 
>  #include <asm/time.h>
>  #include <asm-generic/div64.h>

For some reason, I'm not seeing this build break, but the patch is
obviously correct.

Acked-by: Hollis Blanchard <hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

-- 
Hollis Blanchard
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
@ 2009-10-26 23:20       ` Hollis Blanchard
  0 siblings, 0 replies; 244+ messages in thread
From: Hollis Blanchard @ 2009-10-26 23:20 UTC (permalink / raw)
  To: Olof Johansson
  Cc: Alexander Graf, kvm-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-ppc,
	Arnd Bergmann, Benjamin Herrenschmidt, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On Mon, 2009-10-26 at 18:06 -0500, Olof Johansson wrote:
> Not sure which patch in the series this is needed for since I applied
> them all, but I got:
> 
>   CC      arch/powerpc/kvm/timing.o
> arch/powerpc/kvm/timing.c:205: error: 'THIS_MODULE' undeclared here (not in a function)
> 
> 
> Signed-off-by: Olof Johansson <olof@lixom.net>
> 
> 
> diff --git a/arch/powerpc/kvm/timing.c b/arch/powerpc/kvm/timing.c
> index 2aa371e..7037855 100644
> --- a/arch/powerpc/kvm/timing.c
> +++ b/arch/powerpc/kvm/timing.c
> @@ -23,6 +23,7 @@
>  #include <linux/seq_file.h>
>  #include <linux/debugfs.h>
>  #include <linux/uaccess.h>
> +#include <linux/module.h>
> 
>  #include <asm/time.h>
>  #include <asm-generic/div64.h>

For some reason, I'm not seeing this build break, but the patch is
obviously correct.

Acked-by: Hollis Blanchard <hollisb@us.ibm.com>

-- 
Hollis Blanchard
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
  2009-10-26 23:20       ` Hollis Blanchard
@ 2009-10-26 23:21         ` Olof Johansson
  -1 siblings, 0 replies; 244+ messages in thread
From: Olof Johansson @ 2009-10-26 23:21 UTC (permalink / raw)
  To: Hollis Blanchard
  Cc: Alexander Graf, kvm-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-ppc,
	Arnd Bergmann, Benjamin Herrenschmidt, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On Oct 26, 2009, at 6:20 PM, Hollis Blanchard wrote:


> For some reason, I'm not seeing this build break, but the patch is
> obviously correct.
>
> Acked-by: Hollis Blanchard <hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

I saw it when building with pasemi_defconfig + manually enabled KVM  
options (all available).


-Olof

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
@ 2009-10-26 23:21         ` Olof Johansson
  0 siblings, 0 replies; 244+ messages in thread
From: Olof Johansson @ 2009-10-26 23:21 UTC (permalink / raw)
  To: Hollis Blanchard
  Cc: Alexander Graf, kvm-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-ppc,
	Arnd Bergmann, Benjamin Herrenschmidt, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On Oct 26, 2009, at 6:20 PM, Hollis Blanchard wrote:


> For some reason, I'm not seeing this build break, but the patch is
> obviously correct.
>
> Acked-by: Hollis Blanchard <hollisb@us.ibm.com>

I saw it when building with pasemi_defconfig + manually enabled KVM  
options (all available).


-Olof

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
  2009-10-26 23:21         ` Olof Johansson
@ 2009-10-27  8:56           ` Avi Kivity
  -1 siblings, 0 replies; 244+ messages in thread
From: Avi Kivity @ 2009-10-27  8:56 UTC (permalink / raw)
  To: Olof Johansson
  Cc: Hollis Blanchard, Alexander Graf, kvm, kvm-ppc, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

On 10/27/2009 01:21 AM, Olof Johansson wrote:
> On Oct 26, 2009, at 6:20 PM, Hollis Blanchard wrote:
>
>
>> For some reason, I'm not seeing this build break, but the patch is
>> obviously correct.
>>
>> Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
>
> I saw it when building with pasemi_defconfig + manually enabled KVM 
> options (all available).
>

Alex, can you fold this patch in?

No need to repost; just update your git tree.

(btw, please make sure the patchset is bisectable).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
@ 2009-10-27  8:56           ` Avi Kivity
  0 siblings, 0 replies; 244+ messages in thread
From: Avi Kivity @ 2009-10-27  8:56 UTC (permalink / raw)
  To: Olof Johansson
  Cc: Hollis Blanchard, Alexander Graf, kvm, kvm-ppc, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti

On 10/27/2009 01:21 AM, Olof Johansson wrote:
> On Oct 26, 2009, at 6:20 PM, Hollis Blanchard wrote:
>
>
>> For some reason, I'm not seeing this build break, but the patch is
>> obviously correct.
>>
>> Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
>
> I saw it when building with pasemi_defconfig + manually enabled KVM 
> options (all available).
>

Alex, can you fold this patch in?

No need to repost; just update your git tree.

(btw, please make sure the patchset is bisectable).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
  2009-10-27  8:56           ` Avi Kivity
@ 2009-10-27 13:42             ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-27 13:42 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Olof Johansson, Hollis Blanchard, kvm, kvm-ppc, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti


On 27.10.2009, at 09:56, Avi Kivity wrote:

> On 10/27/2009 01:21 AM, Olof Johansson wrote:
>> On Oct 26, 2009, at 6:20 PM, Hollis Blanchard wrote:
>>
>>
>>> For some reason, I'm not seeing this build break, but the patch is
>>> obviously correct.
>>>
>>> Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
>>
>> I saw it when building with pasemi_defconfig + manually enabled KVM  
>> options (all available).
>>
>
> Alex, can you fold this patch in?

I can, but it's only partly related. My patches don't even touch  
timing.c. The only thing I can imagine resulting in a breakage is that  
my patches allow for an =M setting.

So IMHO this patch should be applied before my series. Should I stick  
it as first patch in my git tree?

Alex

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
@ 2009-10-27 13:42             ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-27 13:42 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Olof Johansson, Hollis Blanchard, kvm, kvm-ppc, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti


On 27.10.2009, at 09:56, Avi Kivity wrote:

> On 10/27/2009 01:21 AM, Olof Johansson wrote:
>> On Oct 26, 2009, at 6:20 PM, Hollis Blanchard wrote:
>>
>>
>>> For some reason, I'm not seeing this build break, but the patch is
>>> obviously correct.
>>>
>>> Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
>>
>> I saw it when building with pasemi_defconfig + manually enabled KVM  
>> options (all available).
>>
>
> Alex, can you fold this patch in?

I can, but it's only partly related. My patches don't even touch  
timing.c. The only thing I can imagine resulting in a breakage is that  
my patches allow for an =M setting.

So IMHO this patch should be applied before my series. Should I stick  
it as first patch in my git tree?

Alex

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
       [not found]             ` <8E92E3B9-39D5-4D71-8B8E-96B49430B67B-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-27 15:49                 ` Avi Kivity
  0 siblings, 0 replies; 244+ messages in thread
From: Avi Kivity @ 2009-10-27 15:49 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Olof Johansson, Hollis Blanchard, kvm-u79uwXL29TY76Z2rM5mHXA,
	kvm-ppc, Arnd Bergmann, Benjamin Herrenschmidt, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On 10/27/2009 03:42 PM, Alexander Graf wrote:
>
> I can, but it's only partly related. My patches don't even touch 
> timing.c. The only thing I can imagine resulting in a breakage is that 
> my patches allow for an =M setting.
>
> So IMHO this patch should be applied before my series. Should I stick 
> it as first patch in my git tree?
>

No need, I'll apply it independently.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
@ 2009-10-27 15:49                 ` Avi Kivity
  0 siblings, 0 replies; 244+ messages in thread
From: Avi Kivity @ 2009-10-27 15:49 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Olof Johansson, Hollis Blanchard, kvm-u79uwXL29TY76Z2rM5mHXA,
	kvm-ppc, Arnd Bergmann, Benjamin Herrenschmidt, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On 10/27/2009 03:42 PM, Alexander Graf wrote:
>
> I can, but it's only partly related. My patches don't even touch 
> timing.c. The only thing I can imagine resulting in a breakage is that 
> my patches allow for an =M setting.
>
> So IMHO this patch should be applied before my series. Should I stick 
> it as first patch in my git tree?
>

No need, I'll apply it independently.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 17/27] Make head_64.S aware of KVM real mode code
       [not found]                                                     ` <1256137413-15256-18-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-29  2:45                                                         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-29  2:45 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-ppc,
	Hollis Blanchard, Arnd Bergmann, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
> We need to run some KVM trampoline code in real mode. Unfortunately, real mode
> only covers 8MB on Cell so we need to squeeze ourselves as low as possible.
> 
> Also, we need to trap interrupts to get us back from guest state to host state
> without telling Linux about it.
> 
> This patch adds interrupt traps and includes the KVM code that requires real
> mode in the real mode parts of Linux.
> 
> Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>

Acked-by: Benjamin Herrenschmidt <benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>

> ---
>  arch/powerpc/include/asm/exception-64s.h |    2 ++
>  arch/powerpc/kernel/exceptions-64s.S     |    8 ++++++++
>  arch/powerpc/kernel/head_64.S            |    7 +++++++
>  3 files changed, 17 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
> index a98653b..57c4000 100644
> --- a/arch/powerpc/include/asm/exception-64s.h
> +++ b/arch/powerpc/include/asm/exception-64s.h
> @@ -147,6 +147,7 @@
>  	.globl label##_pSeries;				\
>  label##_pSeries:					\
>  	HMT_MEDIUM;					\
> +	DO_KVM	n;					\
>  	mtspr	SPRN_SPRG_SCRATCH0,r13;		/* save r13 */	\
>  	EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common)
>  
> @@ -170,6 +171,7 @@ label##_pSeries:					\
>  	.globl label##_pSeries;						\
>  label##_pSeries:							\
>  	HMT_MEDIUM;							\
> +	DO_KVM	n;							\
>  	mtspr	SPRN_SPRG_SCRATCH0,r13;	/* save r13 */			\
>  	mfspr	r13,SPRN_SPRG_PACA;	/* get paca address into r13 */	\
>  	std	r9,PACA_EXGEN+EX_R9(r13);	/* save r9, r10 */	\
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index 1808876..fc3ead0 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -41,6 +41,7 @@ __start_interrupts:
>  	. = 0x200
>  _machine_check_pSeries:
>  	HMT_MEDIUM
> +	DO_KVM	0x200
>  	mtspr	SPRN_SPRG_SCRATCH0,r13		/* save r13 */
>  	EXCEPTION_PROLOG_PSERIES(PACA_EXMC, machine_check_common)
>  
> @@ -48,6 +49,7 @@ _machine_check_pSeries:
>  	.globl data_access_pSeries
>  data_access_pSeries:
>  	HMT_MEDIUM
> +	DO_KVM	0x300
>  	mtspr	SPRN_SPRG_SCRATCH0,r13
>  BEGIN_FTR_SECTION
>  	mfspr	r13,SPRN_SPRG_PACA
> @@ -77,6 +79,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_SLB)
>  	.globl data_access_slb_pSeries
>  data_access_slb_pSeries:
>  	HMT_MEDIUM
> +	DO_KVM	0x380
>  	mtspr	SPRN_SPRG_SCRATCH0,r13
>  	mfspr	r13,SPRN_SPRG_PACA		/* get paca address into r13 */
>  	std	r3,PACA_EXSLB+EX_R3(r13)
> @@ -115,6 +118,7 @@ data_access_slb_pSeries:
>  	.globl instruction_access_slb_pSeries
>  instruction_access_slb_pSeries:
>  	HMT_MEDIUM
> +	DO_KVM	0x480
>  	mtspr	SPRN_SPRG_SCRATCH0,r13
>  	mfspr	r13,SPRN_SPRG_PACA		/* get paca address into r13 */
>  	std	r3,PACA_EXSLB+EX_R3(r13)
> @@ -154,6 +158,7 @@ instruction_access_slb_pSeries:
>  	.globl	system_call_pSeries
>  system_call_pSeries:
>  	HMT_MEDIUM
> +	DO_KVM	0xc00
>  BEGIN_FTR_SECTION
>  	cmpdi	r0,0x1ebe
>  	beq-	1f
> @@ -186,12 +191,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
>  	 * trickery is thus necessary
>  	 */
>  	. = 0xf00
> +	DO_KVM	0xf00
>  	b	performance_monitor_pSeries
>  
>  	. = 0xf20
> +	DO_KVM	0xf20
>  	b	altivec_unavailable_pSeries
>  
>  	. = 0xf40
> +	DO_KVM	0xf40
>  	b	vsx_unavailable_pSeries
>  
>  #ifdef CONFIG_CBE_RAS
> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
> index c38afdb..9258074 100644
> --- a/arch/powerpc/kernel/head_64.S
> +++ b/arch/powerpc/kernel/head_64.S
> @@ -37,6 +37,7 @@
>  #include <asm/firmware.h>
>  #include <asm/page_64.h>
>  #include <asm/irqflags.h>
> +#include <asm/kvm_book3s_64_asm.h>
>  
>  /* The physical memory is layed out such that the secondary processor
>   * spin code sits at 0x0000...0x00ff. On server, the vectors follow
> @@ -165,6 +166,12 @@ exception_marker:
>  #include "exceptions-64s.S"
>  #endif
>  
> +/* KVM trampoline code needs to be close to the interrupt handlers */
> +
> +#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
> +#include "../kvm/book3s_64_rmhandlers.S"
> +#endif
> +
>  _GLOBAL(generic_secondary_thread_init)
>  	mr	r24,r3
>  

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 17/27] Make head_64.S aware of KVM real mode code
@ 2009-10-29  2:45                                                         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-29  2:45 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-ppc,
	Hollis Blanchard, Arnd Bergmann, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
> We need to run some KVM trampoline code in real mode. Unfortunately, real mode
> only covers 8MB on Cell so we need to squeeze ourselves as low as possible.
> 
> Also, we need to trap interrupts to get us back from guest state to host state
> without telling Linux about it.
> 
> This patch adds interrupt traps and includes the KVM code that requires real
> mode in the real mode parts of Linux.
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

> ---
>  arch/powerpc/include/asm/exception-64s.h |    2 ++
>  arch/powerpc/kernel/exceptions-64s.S     |    8 ++++++++
>  arch/powerpc/kernel/head_64.S            |    7 +++++++
>  3 files changed, 17 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
> index a98653b..57c4000 100644
> --- a/arch/powerpc/include/asm/exception-64s.h
> +++ b/arch/powerpc/include/asm/exception-64s.h
> @@ -147,6 +147,7 @@
>  	.globl label##_pSeries;				\
>  label##_pSeries:					\
>  	HMT_MEDIUM;					\
> +	DO_KVM	n;					\
>  	mtspr	SPRN_SPRG_SCRATCH0,r13;		/* save r13 */	\
>  	EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common)
>  
> @@ -170,6 +171,7 @@ label##_pSeries:					\
>  	.globl label##_pSeries;						\
>  label##_pSeries:							\
>  	HMT_MEDIUM;							\
> +	DO_KVM	n;							\
>  	mtspr	SPRN_SPRG_SCRATCH0,r13;	/* save r13 */			\
>  	mfspr	r13,SPRN_SPRG_PACA;	/* get paca address into r13 */	\
>  	std	r9,PACA_EXGEN+EX_R9(r13);	/* save r9, r10 */	\
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index 1808876..fc3ead0 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -41,6 +41,7 @@ __start_interrupts:
>  	. = 0x200
>  _machine_check_pSeries:
>  	HMT_MEDIUM
> +	DO_KVM	0x200
>  	mtspr	SPRN_SPRG_SCRATCH0,r13		/* save r13 */
>  	EXCEPTION_PROLOG_PSERIES(PACA_EXMC, machine_check_common)
>  
> @@ -48,6 +49,7 @@ _machine_check_pSeries:
>  	.globl data_access_pSeries
>  data_access_pSeries:
>  	HMT_MEDIUM
> +	DO_KVM	0x300
>  	mtspr	SPRN_SPRG_SCRATCH0,r13
>  BEGIN_FTR_SECTION
>  	mfspr	r13,SPRN_SPRG_PACA
> @@ -77,6 +79,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_SLB)
>  	.globl data_access_slb_pSeries
>  data_access_slb_pSeries:
>  	HMT_MEDIUM
> +	DO_KVM	0x380
>  	mtspr	SPRN_SPRG_SCRATCH0,r13
>  	mfspr	r13,SPRN_SPRG_PACA		/* get paca address into r13 */
>  	std	r3,PACA_EXSLB+EX_R3(r13)
> @@ -115,6 +118,7 @@ data_access_slb_pSeries:
>  	.globl instruction_access_slb_pSeries
>  instruction_access_slb_pSeries:
>  	HMT_MEDIUM
> +	DO_KVM	0x480
>  	mtspr	SPRN_SPRG_SCRATCH0,r13
>  	mfspr	r13,SPRN_SPRG_PACA		/* get paca address into r13 */
>  	std	r3,PACA_EXSLB+EX_R3(r13)
> @@ -154,6 +158,7 @@ instruction_access_slb_pSeries:
>  	.globl	system_call_pSeries
>  system_call_pSeries:
>  	HMT_MEDIUM
> +	DO_KVM	0xc00
>  BEGIN_FTR_SECTION
>  	cmpdi	r0,0x1ebe
>  	beq-	1f
> @@ -186,12 +191,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
>  	 * trickery is thus necessary
>  	 */
>  	. = 0xf00
> +	DO_KVM	0xf00
>  	b	performance_monitor_pSeries
>  
>  	. = 0xf20
> +	DO_KVM	0xf20
>  	b	altivec_unavailable_pSeries
>  
>  	. = 0xf40
> +	DO_KVM	0xf40
>  	b	vsx_unavailable_pSeries
>  
>  #ifdef CONFIG_CBE_RAS
> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
> index c38afdb..9258074 100644
> --- a/arch/powerpc/kernel/head_64.S
> +++ b/arch/powerpc/kernel/head_64.S
> @@ -37,6 +37,7 @@
>  #include <asm/firmware.h>
>  #include <asm/page_64.h>
>  #include <asm/irqflags.h>
> +#include <asm/kvm_book3s_64_asm.h>
>  
>  /* The physical memory is layed out such that the secondary processor
>   * spin code sits at 0x0000...0x00ff. On server, the vectors follow
> @@ -165,6 +166,12 @@ exception_marker:
>  #include "exceptions-64s.S"
>  #endif
>  
> +/* KVM trampoline code needs to be close to the interrupt handlers */
> +
> +#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
> +#include "../kvm/book3s_64_rmhandlers.S"
> +#endif
> +
>  _GLOBAL(generic_secondary_thread_init)
>  	mr	r24,r3
>  



^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 18/27] Add Book3s_64 offsets to asm-offsets.c
       [not found]                                                       ` <1256137413-15256-19-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-29  2:45                                                           ` Benjamin Herrenschmidt
  2009-10-29  2:45                                                           ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-29  2:45 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-ppc,
	Hollis Blanchard, Arnd Bergmann, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
> We need to access some VCPU fields from assembly code. In order to get
> the proper offsets, we have to define them in asm-offsets.c.
> 
> Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>

Acked-by: Benjamin Herrenschmidt <benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>

> ---
>  arch/powerpc/kernel/asm-offsets.c |   13 +++++++++++++
>  1 files changed, 13 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index 0812b0f..aba3ea6 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -398,6 +398,19 @@ int main(void)
>  	DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, arch.last_inst));
>  	DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
>  	DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
> +
> +	/* book3s_64 */
> +#ifdef CONFIG_PPC64
> +	DEFINE(VCPU_FAULT_DSISR, offsetof(struct kvm_vcpu, arch.fault_dsisr));
> +	DEFINE(VCPU_HOST_RETIP, offsetof(struct kvm_vcpu, arch.host_retip));
> +	DEFINE(VCPU_HOST_R2, offsetof(struct kvm_vcpu, arch.host_r2));
> +	DEFINE(VCPU_HOST_MSR, offsetof(struct kvm_vcpu, arch.host_msr));
> +	DEFINE(VCPU_SHADOW_MSR, offsetof(struct kvm_vcpu, arch.shadow_msr));
> +	DEFINE(VCPU_TRAMPOLINE_LOWMEM, offsetof(struct kvm_vcpu, arch.trampoline_lowmem));
> +	DEFINE(VCPU_TRAMPOLINE_ENTER, offsetof(struct kvm_vcpu, arch.trampoline_enter));
> +	DEFINE(VCPU_HIGHMEM_HANDLER, offsetof(struct kvm_vcpu, arch.highmem_handler));
> +	DEFINE(VCPU_HFLAGS, offsetof(struct kvm_vcpu, arch.hflags));
> +#endif
>  #endif
>  #ifdef CONFIG_44x
>  	DEFINE(PGD_T_LOG2, PGD_T_LOG2);

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 18/27] Add Book3s_64 offsets to asm-offsets.c
@ 2009-10-29  2:45                                                           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-29  2:45 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-ppc,
	Hollis Blanchard, Arnd Bergmann, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
> We need to access some VCPU fields from assembly code. In order to get
> the proper offsets, we have to define them in asm-offsets.c.
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

> ---
>  arch/powerpc/kernel/asm-offsets.c |   13 +++++++++++++
>  1 files changed, 13 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index 0812b0f..aba3ea6 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -398,6 +398,19 @@ int main(void)
>  	DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, arch.last_inst));
>  	DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
>  	DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
> +
> +	/* book3s_64 */
> +#ifdef CONFIG_PPC64
> +	DEFINE(VCPU_FAULT_DSISR, offsetof(struct kvm_vcpu, arch.fault_dsisr));
> +	DEFINE(VCPU_HOST_RETIP, offsetof(struct kvm_vcpu, arch.host_retip));
> +	DEFINE(VCPU_HOST_R2, offsetof(struct kvm_vcpu, arch.host_r2));
> +	DEFINE(VCPU_HOST_MSR, offsetof(struct kvm_vcpu, arch.host_msr));
> +	DEFINE(VCPU_SHADOW_MSR, offsetof(struct kvm_vcpu, arch.shadow_msr));
> +	DEFINE(VCPU_TRAMPOLINE_LOWMEM, offsetof(struct kvm_vcpu, arch.trampoline_lowmem));
> +	DEFINE(VCPU_TRAMPOLINE_ENTER, offsetof(struct kvm_vcpu, arch.trampoline_enter));
> +	DEFINE(VCPU_HIGHMEM_HANDLER, offsetof(struct kvm_vcpu, arch.highmem_handler));
> +	DEFINE(VCPU_HFLAGS, offsetof(struct kvm_vcpu, arch.hflags));
> +#endif
>  #endif
>  #ifdef CONFIG_44x
>  	DEFINE(PGD_T_LOG2, PGD_T_LOG2);



^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 19/27] Export symbols for KVM module
       [not found]                                                           ` <1256137413-15256-20-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-29  2:46                                                               ` Benjamin Herrenschmidt
  2009-10-29  2:46                                                               ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-29  2:46 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-ppc,
	Hollis Blanchard, Arnd Bergmann, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
> We want to be able to build KVM as a module. To enable us doing so, we
> need some more exports from core Linux parts.
> 
> This patch exports all functions and variables that are required for KVM.
> 
> Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>

Acked-by: Benjamin Herrenschmidt <benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
---

Quick nit:

>  u16 mmu_slb_size = 64;
> +EXPORT_SYMBOL_GPL(mmu_slb_size);

This value might change when doing a partition migration between a
POWER6 and a POWER7 so KVM might need to be extra careful...

Ben.

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 19/27] Export symbols for KVM module
@ 2009-10-29  2:46                                                               ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-29  2:46 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-ppc,
	Hollis Blanchard, Arnd Bergmann, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
> We want to be able to build KVM as a module. To enable us doing so, we
> need some more exports from core Linux parts.
> 
> This patch exports all functions and variables that are required for KVM.
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

Quick nit:

>  u16 mmu_slb_size = 64;
> +EXPORT_SYMBOL_GPL(mmu_slb_size);

This value might change when doing a partition migration between a
POWER6 and a POWER7 so KVM might need to be extra careful...

Ben.



^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 20/27] Split init_new_context and destroy_context
       [not found]                                                               ` <1256137413-15256-21-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-29  2:48                                                                   ` Benjamin Herrenschmidt
  2009-10-29  2:48                                                                   ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-29  2:48 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-ppc,
	Hollis Blanchard, Arnd Bergmann, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
> For KVM we need to allocate a new context id, but don't really care about
> all the mm context around it.
> 
> So let's split the alloc and destroy functions for the context id, so we can
> grab one without allocating an mm context.

No objection. Might have been better calling the low level guys
something like __get_mmu_context_id() / __put_mmu_context_id() but
we can rename later.

> Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>

Acked-by: Benjamin Herrenschmidt <benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
---

> ---
>  arch/powerpc/include/asm/mmu_context.h |    5 +++++
>  arch/powerpc/mm/mmu_context_hash64.c   |   24 +++++++++++++++++++++---
>  2 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index b34e94d..66b35d0 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -23,6 +23,11 @@ extern void switch_slb(struct task_struct *tsk, struct mm_struct *mm);
>  extern void set_context(unsigned long id, pgd_t *pgd);
>  
>  #ifdef CONFIG_PPC_BOOK3S_64
> +extern int __init_new_context(void);
> +extern void __destroy_context(int context_id);
> +#endif
> +
> +#ifdef CONFIG_PPC_BOOK3S_64
>  static inline void mmu_context_init(void) { }
>  #else
>  extern void mmu_context_init(void);
> diff --git a/arch/powerpc/mm/mmu_context_hash64.c b/arch/powerpc/mm/mmu_context_hash64.c
> index dbeb86a..b9e4cc2 100644
> --- a/arch/powerpc/mm/mmu_context_hash64.c
> +++ b/arch/powerpc/mm/mmu_context_hash64.c
> @@ -18,6 +18,7 @@
>  #include <linux/mm.h>
>  #include <linux/spinlock.h>
>  #include <linux/idr.h>
> +#include <linux/module.h>
>  
>  #include <asm/mmu_context.h>
>  
> @@ -32,7 +33,7 @@ static DEFINE_IDR(mmu_context_idr);
>  #define NO_CONTEXT	0
>  #define MAX_CONTEXT	((1UL << 19) - 1)
>  
> -int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
> +int __init_new_context(void)
>  {
>  	int index;
>  	int err;
> @@ -57,6 +58,18 @@ again:
>  		return -ENOMEM;
>  	}
>  
> +	return index;
> +}
> +EXPORT_SYMBOL_GPL(__init_new_context);
> +
> +int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
> +{
> +	int index;
> +
> +	index = __init_new_context();
> +	if (index < 0)
> +		return index;
> +
>  	/* The old code would re-promote on fork, we don't do that
>  	 * when using slices as it could cause problem promoting slices
>  	 * that have been forced down to 4K
> @@ -68,11 +81,16 @@ again:
>  	return 0;
>  }
>  
> -void destroy_context(struct mm_struct *mm)
> +void __destroy_context(int context_id)
>  {
>  	spin_lock(&mmu_context_lock);
> -	idr_remove(&mmu_context_idr, mm->context.id);
> +	idr_remove(&mmu_context_idr, context_id);
>  	spin_unlock(&mmu_context_lock);
> +}
> +EXPORT_SYMBOL_GPL(__destroy_context);
>  
> +void destroy_context(struct mm_struct *mm)
> +{
> +	__destroy_context(mm->context.id);
>  	mm->context.id = NO_CONTEXT;
>  }

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 20/27] Split init_new_context and destroy_context
@ 2009-10-29  2:48                                                                   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-29  2:48 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-ppc,
	Hollis Blanchard, Arnd Bergmann, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
> For KVM we need to allocate a new context id, but don't really care about
> all the mm context around it.
> 
> So let's split the alloc and destroy functions for the context id, so we can
> grab one without allocating an mm context.

No objection. Might have been better calling the low level guys
something like __get_mmu_context_id() / __put_mmu_context_id() but
we can rename later.

> Signed-off-by: Alexander Graf <agraf@suse.de>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

> ---
>  arch/powerpc/include/asm/mmu_context.h |    5 +++++
>  arch/powerpc/mm/mmu_context_hash64.c   |   24 +++++++++++++++++++++---
>  2 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index b34e94d..66b35d0 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -23,6 +23,11 @@ extern void switch_slb(struct task_struct *tsk, struct mm_struct *mm);
>  extern void set_context(unsigned long id, pgd_t *pgd);
>  
>  #ifdef CONFIG_PPC_BOOK3S_64
> +extern int __init_new_context(void);
> +extern void __destroy_context(int context_id);
> +#endif
> +
> +#ifdef CONFIG_PPC_BOOK3S_64
>  static inline void mmu_context_init(void) { }
>  #else
>  extern void mmu_context_init(void);
> diff --git a/arch/powerpc/mm/mmu_context_hash64.c b/arch/powerpc/mm/mmu_context_hash64.c
> index dbeb86a..b9e4cc2 100644
> --- a/arch/powerpc/mm/mmu_context_hash64.c
> +++ b/arch/powerpc/mm/mmu_context_hash64.c
> @@ -18,6 +18,7 @@
>  #include <linux/mm.h>
>  #include <linux/spinlock.h>
>  #include <linux/idr.h>
> +#include <linux/module.h>
>  
>  #include <asm/mmu_context.h>
>  
> @@ -32,7 +33,7 @@ static DEFINE_IDR(mmu_context_idr);
>  #define NO_CONTEXT	0
>  #define MAX_CONTEXT	((1UL << 19) - 1)
>  
> -int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
> +int __init_new_context(void)
>  {
>  	int index;
>  	int err;
> @@ -57,6 +58,18 @@ again:
>  		return -ENOMEM;
>  	}
>  
> +	return index;
> +}
> +EXPORT_SYMBOL_GPL(__init_new_context);
> +
> +int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
> +{
> +	int index;
> +
> +	index = __init_new_context();
> +	if (index < 0)
> +		return index;
> +
>  	/* The old code would re-promote on fork, we don't do that
>  	 * when using slices as it could cause problem promoting slices
>  	 * that have been forced down to 4K
> @@ -68,11 +81,16 @@ again:
>  	return 0;
>  }
>  
> -void destroy_context(struct mm_struct *mm)
> +void __destroy_context(int context_id)
>  {
>  	spin_lock(&mmu_context_lock);
> -	idr_remove(&mmu_context_idr, mm->context.id);
> +	idr_remove(&mmu_context_idr, context_id);
>  	spin_unlock(&mmu_context_lock);
> +}
> +EXPORT_SYMBOL_GPL(__destroy_context);
>  
> +void destroy_context(struct mm_struct *mm)
> +{
> +	__destroy_context(mm->context.id);
>  	mm->context.id = NO_CONTEXT;
>  }



^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 22/27] Add fields to PACA
       [not found]                                                                     ` <1256137413-15256-23-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-29  2:50                                                                         ` Benjamin Herrenschmidt
  2009-10-29  2:50                                                                         ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-29  2:50 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-ppc,
	Hollis Blanchard, Arnd Bergmann, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
> For KVM we need to store some information in the PACA, so we
> need to extend it.
> 
> This patch adds KVM SLB shadow related entries to the PACA and
> a field that indicates if we're inside a guest.
> 
> Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
> ---

Acked-by: Benjamin Herrenschmidt <benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>

>  arch/powerpc/include/asm/paca.h |    9 +++++++++
>  1 files changed, 9 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index 7d8514c..5e9b4ef 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -129,6 +129,15 @@ struct paca_struct {
>  	u64 system_time;		/* accumulated system TB ticks */
>  	u64 startpurr;			/* PURR/TB value snapshot */
>  	u64 startspurr;			/* SPURR value snapshot */
> +
> +#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
> +	struct  {
> +		u64     esid;
> +		u64     vsid;
> +	} kvm_slb[64];			/* guest SLB */
> +	u8 kvm_slb_max;			/* highest used guest slb entry */
> +	u8 kvm_in_guest;		/* are we inside the guest? */
> +#endif
>  };
>  
>  extern struct paca_struct paca[];

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 22/27] Add fields to PACA
@ 2009-10-29  2:50                                                                         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-29  2:50 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-ppc,
	Hollis Blanchard, Arnd Bergmann, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
> For KVM we need to store some information in the PACA, so we
> need to extend it.
> 
> This patch adds KVM SLB shadow related entries to the PACA and
> a field that indicates if we're inside a guest.
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>
> ---

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

>  arch/powerpc/include/asm/paca.h |    9 +++++++++
>  1 files changed, 9 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index 7d8514c..5e9b4ef 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -129,6 +129,15 @@ struct paca_struct {
>  	u64 system_time;		/* accumulated system TB ticks */
>  	u64 startpurr;			/* PURR/TB value snapshot */
>  	u64 startspurr;			/* SPURR value snapshot */
> +
> +#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
> +	struct  {
> +		u64     esid;
> +		u64     vsid;
> +	} kvm_slb[64];			/* guest SLB */
> +	u8 kvm_slb_max;			/* highest used guest slb entry */
> +	u8 kvm_in_guest;		/* are we inside the guest? */
> +#endif
>  };
>  
>  extern struct paca_struct paca[];



^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 23/27] Export new PACA constants in asm-offsets
       [not found]                                                                         ` <1256137413-15256-24-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
@ 2009-10-29  2:50                                                                             ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-29  2:50 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-ppc,
	Hollis Blanchard, Arnd Bergmann, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
> In order to access fields in the PACA from assembly code, we need
> to generate offsets using asm-offsets.c.
> 
> So let's add the new PACA related bits, we just introduced!
> 
> Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>

Acked-by: Benjamin Herrenschmidt <benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
---

> ---
>  arch/powerpc/kernel/asm-offsets.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index aba3ea6..e2e2082 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -190,6 +190,11 @@ int main(void)
>  	DEFINE(PACA_SYSTEM_TIME, offsetof(struct paca_struct, system_time));
>  	DEFINE(PACA_DATA_OFFSET, offsetof(struct paca_struct, data_offset));
>  	DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save));
> +#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
> +	DEFINE(PACA_KVM_IN_GUEST, offsetof(struct paca_struct, kvm_in_guest));
> +	DEFINE(PACA_KVM_SLB, offsetof(struct paca_struct, kvm_slb));
> +	DEFINE(PACA_KVM_SLB_MAX, offsetof(struct paca_struct, kvm_slb_max));
> +#endif
>  #endif /* CONFIG_PPC64 */
>  
>  	/* RTAS */

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 23/27] Export new PACA constants in asm-offsets
@ 2009-10-29  2:50                                                                             ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-29  2:50 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-ppc,
	Hollis Blanchard, Arnd Bergmann, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti

On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
> In order to access fields in the PACA from assembly code, we need
> to generate offsets using asm-offsets.c.
> 
> So let's add the new PACA related bits, we just introduced!
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

> ---
>  arch/powerpc/kernel/asm-offsets.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index aba3ea6..e2e2082 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -190,6 +190,11 @@ int main(void)
>  	DEFINE(PACA_SYSTEM_TIME, offsetof(struct paca_struct, system_time));
>  	DEFINE(PACA_DATA_OFFSET, offsetof(struct paca_struct, data_offset));
>  	DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save));
> +#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
> +	DEFINE(PACA_KVM_IN_GUEST, offsetof(struct paca_struct, kvm_in_guest));
> +	DEFINE(PACA_KVM_SLB, offsetof(struct paca_struct, kvm_slb));
> +	DEFINE(PACA_KVM_SLB_MAX, offsetof(struct paca_struct, kvm_slb_max));
> +#endif
>  #endif /* CONFIG_PPC64 */
>  
>  	/* RTAS */



^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 19/27] Export symbols for KVM module
  2009-10-29  2:46                                                               ` Benjamin Herrenschmidt
@ 2009-10-29  2:53                                                                 ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-29  2:53 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: kvm, Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Kevin Wolf, bphilips, Marcelo Tosatti


On 29.10.2009, at 03:46, Benjamin Herrenschmidt wrote:

> On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
>> We want to be able to build KVM as a module. To enable us doing so,  
>> we
>> need some more exports from core Linux parts.
>>
>> This patch exports all functions and variables that are required  
>> for KVM.
>>
>> Signed-off-by: Alexander Graf <agraf@suse.de>
>
> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>
> Quick nit:
>
>> u16 mmu_slb_size = 64;
>> +EXPORT_SYMBOL_GPL(mmu_slb_size);
>
> This value might change when doing a partition migration between a
> POWER6 and a POWER7 so KVM might need to be extra careful...

It shouldn't be an issue.

The SLB size for the guest and the SLB size we project things on to on  
the host are completely independent. So when we migrate, only the  
guest information is transferred to the new host which then uses its  
own SLB information to regenerate the shadow entries.

So everything's fine :-)

Alex


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 19/27] Export symbols for KVM module
@ 2009-10-29  2:53                                                                 ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-29  2:53 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: kvm, Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Kevin Wolf, bphilips, Marcelo Tosatti


On 29.10.2009, at 03:46, Benjamin Herrenschmidt wrote:

> On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
>> We want to be able to build KVM as a module. To enable us doing so,  
>> we
>> need some more exports from core Linux parts.
>>
>> This patch exports all functions and variables that are required  
>> for KVM.
>>
>> Signed-off-by: Alexander Graf <agraf@suse.de>
>
> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>
> Quick nit:
>
>> u16 mmu_slb_size = 64;
>> +EXPORT_SYMBOL_GPL(mmu_slb_size);
>
> This value might change when doing a partition migration between a
> POWER6 and a POWER7 so KVM might need to be extra careful...

It shouldn't be an issue.

The SLB size for the guest and the SLB size we project things on to on  
the host are completely independent. So when we migrate, only the  
guest information is transferred to the new host which then uses its  
own SLB information to regenerate the shadow entries.

So everything's fine :-)

Alex


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
       [not found]         ` <4AE44C14.8040507-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2009-10-29  2:55             ` Benjamin Herrenschmidt
  2009-10-29  2:55             ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-29  2:55 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Hollis Blanchard, Alexander Graf, kvm-u79uwXL29TY76Z2rM5mHXA,
	kvm-ppc, Arnd Bergmann, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

On Sun, 2009-10-25 at 15:01 +0200, Avi Kivity wrote:
> On 10/23/2009 02:33 AM, Hollis Blanchard wrote:
> > On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
> >    
> >> KVM for PowerPC only supports embedded cores at the moment.
> >>
> >> While it makes sense to virtualize on small machines, it's even more fun
> >> to do so on big boxes. So I figured we need KVM for PowerPC64 as well.
> >>
> >> This patchset implements KVM support for Book3s_64 hosts and guest support
> >> for Book3s_64 and G3/G4.
> >>      
> > Acked-by: Hollis Blanchard<hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> >
> > Avi, please apply these patches
> >    
> 
> I still need acks for the arch/powerpc/{kernel,mm} bits, simple as they 
> are, from the powerpc maintainers.

You should have all you need by now. Let me know if you need more :-)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
@ 2009-10-29  2:55             ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-29  2:55 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Hollis Blanchard, Alexander Graf, kvm-u79uwXL29TY76Z2rM5mHXA,
	kvm-ppc, Arnd Bergmann, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti

On Sun, 2009-10-25 at 15:01 +0200, Avi Kivity wrote:
> On 10/23/2009 02:33 AM, Hollis Blanchard wrote:
> > On Wed, 2009-10-21 at 17:03 +0200, Alexander Graf wrote:
> >    
> >> KVM for PowerPC only supports embedded cores at the moment.
> >>
> >> While it makes sense to virtualize on small machines, it's even more fun
> >> to do so on big boxes. So I figured we need KVM for PowerPC64 as well.
> >>
> >> This patchset implements KVM support for Book3s_64 hosts and guest support
> >> for Book3s_64 and G3/G4.
> >>      
> > Acked-by: Hollis Blanchard<hollisb@us.ibm.com>
> >
> > Avi, please apply these patches
> >    
> 
> I still need acks for the arch/powerpc/{kernel,mm} bits, simple as they 
> are, from the powerpc maintainers.

You should have all you need by now. Let me know if you need more :-)

Cheers,
Ben.




^ permalink raw reply	[flat|nested] 244+ messages in thread

* [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v6
@ 2009-10-30 15:47 ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

KVM for PowerPC only supports embedded cores at the moment.

While it makes sense to virtualize on small machines, it's even more fun
to do so on big boxes. So I figured we need KVM for PowerPC64 as well.

This patchset implements KVM support for Book3s_64 hosts and guest support
for Book3s_64 and G3/G4.

To really make use of this, you also need a recent version of qemu.

This time around I have no git tree to pull from. Sorry guys :-).

V1 -> V2:

 - extend sregs with padding
 - new naming scheme (ppc64 -> book3s_64; 74xx -> book3s_32)
 - to_phys -> in-kernel tophys()
 - loadimm -> LOAD_REG_IMMEDIATE
 - call .ko kvm.ko
 - set magic paca bit later
 - run guest code with PACA->soft_enabled=true
 - pt_regs for host state saving (guest too?)
 - only do HV dcbz trick on 970
 - refuse to run on LPAR because of missing SLB pieces

V2 -> V3:

 - fix DAR/DSISR saving
 - allow running on LPAR by modifying the SLB shadow
 - change the SLB implementation to use a mem-backed cache and do
   full world switch on enter/exit. gets rid of "context" magic
 - be more aggressive about DEC injection
 - remove fast ld/st because we're always in host context
 - don't use SPRGs in real->paged transition
 - implement dirty log
 - remove MMIO speedup code
 - SPRG cleanup
   - rename SPRG3 -> SPRN_SPRG_PACA
   - rename SPRG1 -> SPRN_SPRG_SCRATCH0
   - don't use SPRG2

V3 -> V4:

 - use context_id instead of mm_alloc
 - export less

V4 -> V5:

 - use get_tb instead of mftb
 - make ppc32 and ppc64 emulation share more code
 - make pvr 32 bits
 - add patch to use hrtimer for decrememter

V5 -> V6:

 - // -> /* */ style comments
 - make code easier to read
 - not take mmap_sem when it's not needed

Alexander Graf (27):
  Move dirty logging code to sub-arch
  Pass PVR in sregs
  Add Book3s definitions
  Add Book3s fields to vcpu structs
  Add asm/kvm_book3s.h
  Add Book3s_64 intercept helpers
  Add book3s_64 highmem asm code
  Add SLB switching code for entry/exit
  Add interrupt handling code
  Add book3s.c
  Add book3s_64 Host MMU handling
  Add book3s_64 guest MMU
  Add book3s_32 guest MMU
  Add book3s_64 specific opcode emulation
  Add mfdec emulation
  Add desktop PowerPC specific emulation
  Make head_64.S aware of KVM real mode code
  Add Book3s_64 offsets to asm-offsets.c
  Export symbols for KVM module
  Split init_new_context and destroy_context
  Export KVM symbols for module
  Add fields to PACA
  Export new PACA constants in asm-offsets
  Include Book3s_64 target in buildsystem
  Fix trace.h
  Use Little Endian for Dirty Bitmap
  Use hrtimers for the decrementer

 arch/powerpc/include/asm/exception-64s.h     |    2 +
 arch/powerpc/include/asm/kvm.h               |    2 +
 arch/powerpc/include/asm/kvm_asm.h           |   39 ++
 arch/powerpc/include/asm/kvm_book3s.h        |  136 ++++
 arch/powerpc/include/asm/kvm_book3s_64_asm.h |   58 ++
 arch/powerpc/include/asm/kvm_host.h          |   79 +++-
 arch/powerpc/include/asm/kvm_ppc.h           |    1 +
 arch/powerpc/include/asm/mmu_context.h       |    5 +
 arch/powerpc/include/asm/paca.h              |    9 +
 arch/powerpc/kernel/asm-offsets.c            |   18 +
 arch/powerpc/kernel/exceptions-64s.S         |    8 +
 arch/powerpc/kernel/head_64.S                |    7 +
 arch/powerpc/kernel/ppc_ksyms.c              |    3 +-
 arch/powerpc/kernel/time.c                   |    1 +
 arch/powerpc/kvm/Kconfig                     |   17 +
 arch/powerpc/kvm/Makefile                    |   27 +-
 arch/powerpc/kvm/book3s.c                    |  925 ++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_32_mmu.c             |  372 +++++++++++
 arch/powerpc/kvm/book3s_64_emulate.c         |  337 ++++++++++
 arch/powerpc/kvm/book3s_64_exports.c         |   24 +
 arch/powerpc/kvm/book3s_64_interrupts.S      |  392 +++++++++++
 arch/powerpc/kvm/book3s_64_mmu.c             |  476 +++++++++++++
 arch/powerpc/kvm/book3s_64_mmu_host.c        |  408 ++++++++++++
 arch/powerpc/kvm/book3s_64_rmhandlers.S      |  131 ++++
 arch/powerpc/kvm/book3s_64_slb.S             |  277 ++++++++
 arch/powerpc/kvm/booke.c                     |    5 +
 arch/powerpc/kvm/emulate.c                   |   66 ++-
 arch/powerpc/kvm/powerpc.c                   |   25 +-
 arch/powerpc/kvm/trace.h                     |    6 +-
 arch/powerpc/mm/hash_utils_64.c              |    2 +
 arch/powerpc/mm/mmu_context_hash64.c         |   24 +-
 virt/kvm/kvm_main.c                          |    5 +-
 32 files changed, 3853 insertions(+), 34 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s.h
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_64_asm.h
 create mode 100644 arch/powerpc/kvm/book3s.c
 create mode 100644 arch/powerpc/kvm/book3s_32_mmu.c
 create mode 100644 arch/powerpc/kvm/book3s_64_emulate.c
 create mode 100644 arch/powerpc/kvm/book3s_64_exports.c
 create mode 100644 arch/powerpc/kvm/book3s_64_interrupts.S
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu.c
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu_host.c
 create mode 100644 arch/powerpc/kvm/book3s_64_rmhandlers.S
 create mode 100644 arch/powerpc/kvm/book3s_64_slb.S

^ permalink raw reply	[flat|nested] 244+ messages in thread

* [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v6
@ 2009-10-30 15:47 ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

KVM for PowerPC only supports embedded cores at the moment.

While it makes sense to virtualize on small machines, it's even more fun
to do so on big boxes. So I figured we need KVM for PowerPC64 as well.

This patchset implements KVM support for Book3s_64 hosts and guest support
for Book3s_64 and G3/G4.

To really make use of this, you also need a recent version of qemu.

This time around I have no git tree to pull from. Sorry guys :-).

V1 -> V2:

 - extend sregs with padding
 - new naming scheme (ppc64 -> book3s_64; 74xx -> book3s_32)
 - to_phys -> in-kernel tophys()
 - loadimm -> LOAD_REG_IMMEDIATE
 - call .ko kvm.ko
 - set magic paca bit later
 - run guest code with PACA->soft_enabled=true
 - pt_regs for host state saving (guest too?)
 - only do HV dcbz trick on 970
 - refuse to run on LPAR because of missing SLB pieces

V2 -> V3:

 - fix DAR/DSISR saving
 - allow running on LPAR by modifying the SLB shadow
 - change the SLB implementation to use a mem-backed cache and do
   full world switch on enter/exit. gets rid of "context" magic
 - be more aggressive about DEC injection
 - remove fast ld/st because we're always in host context
 - don't use SPRGs in real->paged transition
 - implement dirty log
 - remove MMIO speedup code
 - SPRG cleanup
   - rename SPRG3 -> SPRN_SPRG_PACA
   - rename SPRG1 -> SPRN_SPRG_SCRATCH0
   - don't use SPRG2

V3 -> V4:

 - use context_id instead of mm_alloc
 - export less

V4 -> V5:

 - use get_tb instead of mftb
 - make ppc32 and ppc64 emulation share more code
 - make pvr 32 bits
 - add patch to use hrtimer for decrememter

V5 -> V6:

 - // -> /* */ style comments
 - make code easier to read
 - not take mmap_sem when it's not needed

Alexander Graf (27):
  Move dirty logging code to sub-arch
  Pass PVR in sregs
  Add Book3s definitions
  Add Book3s fields to vcpu structs
  Add asm/kvm_book3s.h
  Add Book3s_64 intercept helpers
  Add book3s_64 highmem asm code
  Add SLB switching code for entry/exit
  Add interrupt handling code
  Add book3s.c
  Add book3s_64 Host MMU handling
  Add book3s_64 guest MMU
  Add book3s_32 guest MMU
  Add book3s_64 specific opcode emulation
  Add mfdec emulation
  Add desktop PowerPC specific emulation
  Make head_64.S aware of KVM real mode code
  Add Book3s_64 offsets to asm-offsets.c
  Export symbols for KVM module
  Split init_new_context and destroy_context
  Export KVM symbols for module
  Add fields to PACA
  Export new PACA constants in asm-offsets
  Include Book3s_64 target in buildsystem
  Fix trace.h
  Use Little Endian for Dirty Bitmap
  Use hrtimers for the decrementer

 arch/powerpc/include/asm/exception-64s.h     |    2 +
 arch/powerpc/include/asm/kvm.h               |    2 +
 arch/powerpc/include/asm/kvm_asm.h           |   39 ++
 arch/powerpc/include/asm/kvm_book3s.h        |  136 ++++
 arch/powerpc/include/asm/kvm_book3s_64_asm.h |   58 ++
 arch/powerpc/include/asm/kvm_host.h          |   79 +++-
 arch/powerpc/include/asm/kvm_ppc.h           |    1 +
 arch/powerpc/include/asm/mmu_context.h       |    5 +
 arch/powerpc/include/asm/paca.h              |    9 +
 arch/powerpc/kernel/asm-offsets.c            |   18 +
 arch/powerpc/kernel/exceptions-64s.S         |    8 +
 arch/powerpc/kernel/head_64.S                |    7 +
 arch/powerpc/kernel/ppc_ksyms.c              |    3 +-
 arch/powerpc/kernel/time.c                   |    1 +
 arch/powerpc/kvm/Kconfig                     |   17 +
 arch/powerpc/kvm/Makefile                    |   27 +-
 arch/powerpc/kvm/book3s.c                    |  925 ++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_32_mmu.c             |  372 +++++++++++
 arch/powerpc/kvm/book3s_64_emulate.c         |  337 ++++++++++
 arch/powerpc/kvm/book3s_64_exports.c         |   24 +
 arch/powerpc/kvm/book3s_64_interrupts.S      |  392 +++++++++++
 arch/powerpc/kvm/book3s_64_mmu.c             |  476 +++++++++++++
 arch/powerpc/kvm/book3s_64_mmu_host.c        |  408 ++++++++++++
 arch/powerpc/kvm/book3s_64_rmhandlers.S      |  131 ++++
 arch/powerpc/kvm/book3s_64_slb.S             |  277 ++++++++
 arch/powerpc/kvm/booke.c                     |    5 +
 arch/powerpc/kvm/emulate.c                   |   66 ++-
 arch/powerpc/kvm/powerpc.c                   |   25 +-
 arch/powerpc/kvm/trace.h                     |    6 +-
 arch/powerpc/mm/hash_utils_64.c              |    2 +
 arch/powerpc/mm/mmu_context_hash64.c         |   24 +-
 virt/kvm/kvm_main.c                          |    5 +-
 32 files changed, 3853 insertions(+), 34 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s.h
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_64_asm.h
 create mode 100644 arch/powerpc/kvm/book3s.c
 create mode 100644 arch/powerpc/kvm/book3s_32_mmu.c
 create mode 100644 arch/powerpc/kvm/book3s_64_emulate.c
 create mode 100644 arch/powerpc/kvm/book3s_64_exports.c
 create mode 100644 arch/powerpc/kvm/book3s_64_interrupts.S
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu.c
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu_host.c
 create mode 100644 arch/powerpc/kvm/book3s_64_rmhandlers.S
 create mode 100644 arch/powerpc/kvm/book3s_64_slb.S

^ permalink raw reply	[flat|nested] 244+ messages in thread

* [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v6
@ 2009-10-30 15:47 ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

KVM for PowerPC only supports embedded cores at the moment.

While it makes sense to virtualize on small machines, it's even more fun
to do so on big boxes. So I figured we need KVM for PowerPC64 as well.

This patchset implements KVM support for Book3s_64 hosts and guest support
for Book3s_64 and G3/G4.

To really make use of this, you also need a recent version of qemu.

This time around I have no git tree to pull from. Sorry guys :-).

V1 -> V2:

 - extend sregs with padding
 - new naming scheme (ppc64 -> book3s_64; 74xx -> book3s_32)
 - to_phys -> in-kernel tophys()
 - loadimm -> LOAD_REG_IMMEDIATE
 - call .ko kvm.ko
 - set magic paca bit later
 - run guest code with PACA->soft_enabled=true
 - pt_regs for host state saving (guest too?)
 - only do HV dcbz trick on 970
 - refuse to run on LPAR because of missing SLB pieces

V2 -> V3:

 - fix DAR/DSISR saving
 - allow running on LPAR by modifying the SLB shadow
 - change the SLB implementation to use a mem-backed cache and do
   full world switch on enter/exit. gets rid of "context" magic
 - be more aggressive about DEC injection
 - remove fast ld/st because we're always in host context
 - don't use SPRGs in real->paged transition
 - implement dirty log
 - remove MMIO speedup code
 - SPRG cleanup
   - rename SPRG3 -> SPRN_SPRG_PACA
   - rename SPRG1 -> SPRN_SPRG_SCRATCH0
   - don't use SPRG2

V3 -> V4:

 - use context_id instead of mm_alloc
 - export less

V4 -> V5:

 - use get_tb instead of mftb
 - make ppc32 and ppc64 emulation share more code
 - make pvr 32 bits
 - add patch to use hrtimer for decrememter

V5 -> V6:

 - // -> /* */ style comments
 - make code easier to read
 - not take mmap_sem when it's not needed

Alexander Graf (27):
  Move dirty logging code to sub-arch
  Pass PVR in sregs
  Add Book3s definitions
  Add Book3s fields to vcpu structs
  Add asm/kvm_book3s.h
  Add Book3s_64 intercept helpers
  Add book3s_64 highmem asm code
  Add SLB switching code for entry/exit
  Add interrupt handling code
  Add book3s.c
  Add book3s_64 Host MMU handling
  Add book3s_64 guest MMU
  Add book3s_32 guest MMU
  Add book3s_64 specific opcode emulation
  Add mfdec emulation
  Add desktop PowerPC specific emulation
  Make head_64.S aware of KVM real mode code
  Add Book3s_64 offsets to asm-offsets.c
  Export symbols for KVM module
  Split init_new_context and destroy_context
  Export KVM symbols for module
  Add fields to PACA
  Export new PACA constants in asm-offsets
  Include Book3s_64 target in buildsystem
  Fix trace.h
  Use Little Endian for Dirty Bitmap
  Use hrtimers for the decrementer

 arch/powerpc/include/asm/exception-64s.h     |    2 +
 arch/powerpc/include/asm/kvm.h               |    2 +
 arch/powerpc/include/asm/kvm_asm.h           |   39 ++
 arch/powerpc/include/asm/kvm_book3s.h        |  136 ++++
 arch/powerpc/include/asm/kvm_book3s_64_asm.h |   58 ++
 arch/powerpc/include/asm/kvm_host.h          |   79 +++-
 arch/powerpc/include/asm/kvm_ppc.h           |    1 +
 arch/powerpc/include/asm/mmu_context.h       |    5 +
 arch/powerpc/include/asm/paca.h              |    9 +
 arch/powerpc/kernel/asm-offsets.c            |   18 +
 arch/powerpc/kernel/exceptions-64s.S         |    8 +
 arch/powerpc/kernel/head_64.S                |    7 +
 arch/powerpc/kernel/ppc_ksyms.c              |    3 +-
 arch/powerpc/kernel/time.c                   |    1 +
 arch/powerpc/kvm/Kconfig                     |   17 +
 arch/powerpc/kvm/Makefile                    |   27 +-
 arch/powerpc/kvm/book3s.c                    |  925 ++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_32_mmu.c             |  372 +++++++++++
 arch/powerpc/kvm/book3s_64_emulate.c         |  337 ++++++++++
 arch/powerpc/kvm/book3s_64_exports.c         |   24 +
 arch/powerpc/kvm/book3s_64_interrupts.S      |  392 +++++++++++
 arch/powerpc/kvm/book3s_64_mmu.c             |  476 +++++++++++++
 arch/powerpc/kvm/book3s_64_mmu_host.c        |  408 ++++++++++++
 arch/powerpc/kvm/book3s_64_rmhandlers.S      |  131 ++++
 arch/powerpc/kvm/book3s_64_slb.S             |  277 ++++++++
 arch/powerpc/kvm/booke.c                     |    5 +
 arch/powerpc/kvm/emulate.c                   |   66 ++-
 arch/powerpc/kvm/powerpc.c                   |   25 +-
 arch/powerpc/kvm/trace.h                     |    6 +-
 arch/powerpc/mm/hash_utils_64.c              |    2 +
 arch/powerpc/mm/mmu_context_hash64.c         |   24 +-
 virt/kvm/kvm_main.c                          |    5 +-
 32 files changed, 3853 insertions(+), 34 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s.h
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_64_asm.h
 create mode 100644 arch/powerpc/kvm/book3s.c
 create mode 100644 arch/powerpc/kvm/book3s_32_mmu.c
 create mode 100644 arch/powerpc/kvm/book3s_64_emulate.c
 create mode 100644 arch/powerpc/kvm/book3s_64_exports.c
 create mode 100644 arch/powerpc/kvm/book3s_64_interrupts.S
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu.c
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu_host.c
 create mode 100644 arch/powerpc/kvm/book3s_64_rmhandlers.S
 create mode 100644 arch/powerpc/kvm/book3s_64_slb.S


^ permalink raw reply	[flat|nested] 244+ messages in thread

* [PATCH 01/27] Move dirty logging code to sub-arch
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47     ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

PowerPC code handles dirty logging in the generic parts atm. While this
is great for "return -ENOTSUPP", we need to be rather target specific
when actually implementing it.

So let's split it to implementation specific code, so we can implement
it for book3s.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
---
 arch/powerpc/kvm/booke.c   |    5 +++++
 arch/powerpc/kvm/powerpc.c |    5 -----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index e7bf4d0..06f5a9e 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -520,6 +520,11 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 	return kvmppc_core_vcpu_translate(vcpu, tr);
 }
 
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+	return -ENOTSUPP;
+}
+
 int __init kvmppc_booke_init(void)
 {
 	unsigned long ivor[16];
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 5902bbc..4ae3490 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -410,11 +410,6 @@ out:
 	return r;
 }
 
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
-{
-	return -ENOTSUPP;
-}
-
 long kvm_arch_vm_ioctl(struct file *filp,
                        unsigned int ioctl, unsigned long arg)
 {
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 01/27] Move dirty logging code to sub-arch
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

PowerPC code handles dirty logging in the generic parts atm. While this
is great for "return -ENOTSUPP", we need to be rather target specific
when actually implementing it.

So let's split it to implementation specific code, so we can implement
it for book3s.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/booke.c   |    5 +++++
 arch/powerpc/kvm/powerpc.c |    5 -----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index e7bf4d0..06f5a9e 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -520,6 +520,11 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 	return kvmppc_core_vcpu_translate(vcpu, tr);
 }
 
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+	return -ENOTSUPP;
+}
+
 int __init kvmppc_booke_init(void)
 {
 	unsigned long ivor[16];
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 5902bbc..4ae3490 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -410,11 +410,6 @@ out:
 	return r;
 }
 
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
-{
-	return -ENOTSUPP;
-}
-
 long kvm_arch_vm_ioctl(struct file *filp,
                        unsigned int ioctl, unsigned long arg)
 {
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 01/27] Move dirty logging code to sub-arch
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

PowerPC code handles dirty logging in the generic parts atm. While this
is great for "return -ENOTSUPP", we need to be rather target specific
when actually implementing it.

So let's split it to implementation specific code, so we can implement
it for book3s.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/booke.c   |    5 +++++
 arch/powerpc/kvm/powerpc.c |    5 -----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index e7bf4d0..06f5a9e 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -520,6 +520,11 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 	return kvmppc_core_vcpu_translate(vcpu, tr);
 }
 
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+	return -ENOTSUPP;
+}
+
 int __init kvmppc_booke_init(void)
 {
 	unsigned long ivor[16];
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 5902bbc..4ae3490 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -410,11 +410,6 @@ out:
 	return r;
 }
 
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
-{
-	return -ENOTSUPP;
-}
-
 long kvm_arch_vm_ioctl(struct file *filp,
                        unsigned int ioctl, unsigned long arg)
 {
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 02/27] Pass PVR in sregs
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47   ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

Right now sregs is unused on PPC, so we can use it for initialization
of the CPU.

KVM on BookE always virtualizes the host CPU. On Book3s we go a step further
and take the PVR from userspace that tells us what kind of CPU we are supposed
to virtualize, because we support Book3s_32 and Book3s_64 guests.

In order to get that information, we use the sregs ioctl, because we don't
want to reset the guest CPU on every normal register set.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v4 -> v5

  - make PVR 32 bits
---
 arch/powerpc/include/asm/kvm.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h
index bb2de6a..c9ca97f 100644
--- a/arch/powerpc/include/asm/kvm.h
+++ b/arch/powerpc/include/asm/kvm.h
@@ -46,6 +46,8 @@ struct kvm_regs {
 };
 
 struct kvm_sregs {
+	__u32 pvr;
+	char pad[1020];
 };
 
 struct kvm_fpu {
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 02/27] Pass PVR in sregs
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

Right now sregs is unused on PPC, so we can use it for initialization
of the CPU.

KVM on BookE always virtualizes the host CPU. On Book3s we go a step further
and take the PVR from userspace that tells us what kind of CPU we are supposed
to virtualize, because we support Book3s_32 and Book3s_64 guests.

In order to get that information, we use the sregs ioctl, because we don't
want to reset the guest CPU on every normal register set.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v4 -> v5

  - make PVR 32 bits
---
 arch/powerpc/include/asm/kvm.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h
index bb2de6a..c9ca97f 100644
--- a/arch/powerpc/include/asm/kvm.h
+++ b/arch/powerpc/include/asm/kvm.h
@@ -46,6 +46,8 @@ struct kvm_regs {
 };
 
 struct kvm_sregs {
+	__u32 pvr;
+	char pad[1020];
 };
 
 struct kvm_fpu {
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 02/27] Pass PVR in sregs
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

Right now sregs is unused on PPC, so we can use it for initialization
of the CPU.

KVM on BookE always virtualizes the host CPU. On Book3s we go a step further
and take the PVR from userspace that tells us what kind of CPU we are supposed
to virtualize, because we support Book3s_32 and Book3s_64 guests.

In order to get that information, we use the sregs ioctl, because we don't
want to reset the guest CPU on every normal register set.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v4 -> v5

  - make PVR 32 bits
---
 arch/powerpc/include/asm/kvm.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm.h b/arch/powerpc/include/asm/kvm.h
index bb2de6a..c9ca97f 100644
--- a/arch/powerpc/include/asm/kvm.h
+++ b/arch/powerpc/include/asm/kvm.h
@@ -46,6 +46,8 @@ struct kvm_regs {
 };
 
 struct kvm_sregs {
+	__u32 pvr;
+	char pad[1020];
 };
 
 struct kvm_fpu {
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 03/27] Add Book3s definitions
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47     ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

We need quite a bunch of new constants for KVM on Book3s,
so let's define them now.

These constants will be used in later patches.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>

---

v3 -> v4

  - remove old kernel compat code
---
 arch/powerpc/include/asm/kvm_asm.h |   39 ++++++++++++++++++++++++++++++++++++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h
index 56bfae5..19ddb35 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -49,6 +49,45 @@
 #define BOOKE_INTERRUPT_SPE_FP_ROUND 34
 #define BOOKE_INTERRUPT_PERFORMANCE_MONITOR 35
 
+/* book3s */
+
+#define BOOK3S_INTERRUPT_SYSTEM_RESET	0x100
+#define BOOK3S_INTERRUPT_MACHINE_CHECK	0x200
+#define BOOK3S_INTERRUPT_DATA_STORAGE	0x300
+#define BOOK3S_INTERRUPT_DATA_SEGMENT	0x380
+#define BOOK3S_INTERRUPT_INST_STORAGE	0x400
+#define BOOK3S_INTERRUPT_INST_SEGMENT	0x480
+#define BOOK3S_INTERRUPT_EXTERNAL	0x500
+#define BOOK3S_INTERRUPT_ALIGNMENT	0x600
+#define BOOK3S_INTERRUPT_PROGRAM	0x700
+#define BOOK3S_INTERRUPT_FP_UNAVAIL	0x800
+#define BOOK3S_INTERRUPT_DECREMENTER	0x900
+#define BOOK3S_INTERRUPT_SYSCALL	0xc00
+#define BOOK3S_INTERRUPT_TRACE		0xd00
+#define BOOK3S_INTERRUPT_PERFMON	0xf00
+#define BOOK3S_INTERRUPT_ALTIVEC	0xf20
+#define BOOK3S_INTERRUPT_VSX		0xf40
+
+#define BOOK3S_IRQPRIO_SYSTEM_RESET		0
+#define BOOK3S_IRQPRIO_DATA_SEGMENT		1
+#define BOOK3S_IRQPRIO_INST_SEGMENT		2
+#define BOOK3S_IRQPRIO_DATA_STORAGE		3
+#define BOOK3S_IRQPRIO_INST_STORAGE		4
+#define BOOK3S_IRQPRIO_ALIGNMENT		5
+#define BOOK3S_IRQPRIO_PROGRAM			6
+#define BOOK3S_IRQPRIO_FP_UNAVAIL		7
+#define BOOK3S_IRQPRIO_ALTIVEC			8
+#define BOOK3S_IRQPRIO_VSX			9
+#define BOOK3S_IRQPRIO_SYSCALL			10
+#define BOOK3S_IRQPRIO_MACHINE_CHECK		11
+#define BOOK3S_IRQPRIO_DEBUG			12
+#define BOOK3S_IRQPRIO_EXTERNAL			13
+#define BOOK3S_IRQPRIO_DECREMENTER		14
+#define BOOK3S_IRQPRIO_PERFORMANCE_MONITOR	15
+#define BOOK3S_IRQPRIO_MAX			16
+
+#define BOOK3S_HFLAG_DCBZ32			0x1
+
 #define RESUME_FLAG_NV          (1<<0)  /* Reload guest nonvolatile state? */
 #define RESUME_FLAG_HOST        (1<<1)  /* Resume host? */
 
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 03/27] Add Book3s definitions
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

We need quite a bunch of new constants for KVM on Book3s,
so let's define them now.

These constants will be used in later patches.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4

  - remove old kernel compat code
---
 arch/powerpc/include/asm/kvm_asm.h |   39 ++++++++++++++++++++++++++++++++++++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h
index 56bfae5..19ddb35 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -49,6 +49,45 @@
 #define BOOKE_INTERRUPT_SPE_FP_ROUND 34
 #define BOOKE_INTERRUPT_PERFORMANCE_MONITOR 35
 
+/* book3s */
+
+#define BOOK3S_INTERRUPT_SYSTEM_RESET	0x100
+#define BOOK3S_INTERRUPT_MACHINE_CHECK	0x200
+#define BOOK3S_INTERRUPT_DATA_STORAGE	0x300
+#define BOOK3S_INTERRUPT_DATA_SEGMENT	0x380
+#define BOOK3S_INTERRUPT_INST_STORAGE	0x400
+#define BOOK3S_INTERRUPT_INST_SEGMENT	0x480
+#define BOOK3S_INTERRUPT_EXTERNAL	0x500
+#define BOOK3S_INTERRUPT_ALIGNMENT	0x600
+#define BOOK3S_INTERRUPT_PROGRAM	0x700
+#define BOOK3S_INTERRUPT_FP_UNAVAIL	0x800
+#define BOOK3S_INTERRUPT_DECREMENTER	0x900
+#define BOOK3S_INTERRUPT_SYSCALL	0xc00
+#define BOOK3S_INTERRUPT_TRACE		0xd00
+#define BOOK3S_INTERRUPT_PERFMON	0xf00
+#define BOOK3S_INTERRUPT_ALTIVEC	0xf20
+#define BOOK3S_INTERRUPT_VSX		0xf40
+
+#define BOOK3S_IRQPRIO_SYSTEM_RESET		0
+#define BOOK3S_IRQPRIO_DATA_SEGMENT		1
+#define BOOK3S_IRQPRIO_INST_SEGMENT		2
+#define BOOK3S_IRQPRIO_DATA_STORAGE		3
+#define BOOK3S_IRQPRIO_INST_STORAGE		4
+#define BOOK3S_IRQPRIO_ALIGNMENT		5
+#define BOOK3S_IRQPRIO_PROGRAM			6
+#define BOOK3S_IRQPRIO_FP_UNAVAIL		7
+#define BOOK3S_IRQPRIO_ALTIVEC			8
+#define BOOK3S_IRQPRIO_VSX			9
+#define BOOK3S_IRQPRIO_SYSCALL			10
+#define BOOK3S_IRQPRIO_MACHINE_CHECK		11
+#define BOOK3S_IRQPRIO_DEBUG			12
+#define BOOK3S_IRQPRIO_EXTERNAL			13
+#define BOOK3S_IRQPRIO_DECREMENTER		14
+#define BOOK3S_IRQPRIO_PERFORMANCE_MONITOR	15
+#define BOOK3S_IRQPRIO_MAX			16
+
+#define BOOK3S_HFLAG_DCBZ32			0x1
+
 #define RESUME_FLAG_NV          (1<<0)  /* Reload guest nonvolatile state? */
 #define RESUME_FLAG_HOST        (1<<1)  /* Resume host? */
 
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 03/27] Add Book3s definitions
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

We need quite a bunch of new constants for KVM on Book3s,
so let's define them now.

These constants will be used in later patches.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4

  - remove old kernel compat code
---
 arch/powerpc/include/asm/kvm_asm.h |   39 ++++++++++++++++++++++++++++++++++++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h
index 56bfae5..19ddb35 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -49,6 +49,45 @@
 #define BOOKE_INTERRUPT_SPE_FP_ROUND 34
 #define BOOKE_INTERRUPT_PERFORMANCE_MONITOR 35
 
+/* book3s */
+
+#define BOOK3S_INTERRUPT_SYSTEM_RESET	0x100
+#define BOOK3S_INTERRUPT_MACHINE_CHECK	0x200
+#define BOOK3S_INTERRUPT_DATA_STORAGE	0x300
+#define BOOK3S_INTERRUPT_DATA_SEGMENT	0x380
+#define BOOK3S_INTERRUPT_INST_STORAGE	0x400
+#define BOOK3S_INTERRUPT_INST_SEGMENT	0x480
+#define BOOK3S_INTERRUPT_EXTERNAL	0x500
+#define BOOK3S_INTERRUPT_ALIGNMENT	0x600
+#define BOOK3S_INTERRUPT_PROGRAM	0x700
+#define BOOK3S_INTERRUPT_FP_UNAVAIL	0x800
+#define BOOK3S_INTERRUPT_DECREMENTER	0x900
+#define BOOK3S_INTERRUPT_SYSCALL	0xc00
+#define BOOK3S_INTERRUPT_TRACE		0xd00
+#define BOOK3S_INTERRUPT_PERFMON	0xf00
+#define BOOK3S_INTERRUPT_ALTIVEC	0xf20
+#define BOOK3S_INTERRUPT_VSX		0xf40
+
+#define BOOK3S_IRQPRIO_SYSTEM_RESET		0
+#define BOOK3S_IRQPRIO_DATA_SEGMENT		1
+#define BOOK3S_IRQPRIO_INST_SEGMENT		2
+#define BOOK3S_IRQPRIO_DATA_STORAGE		3
+#define BOOK3S_IRQPRIO_INST_STORAGE		4
+#define BOOK3S_IRQPRIO_ALIGNMENT		5
+#define BOOK3S_IRQPRIO_PROGRAM			6
+#define BOOK3S_IRQPRIO_FP_UNAVAIL		7
+#define BOOK3S_IRQPRIO_ALTIVEC			8
+#define BOOK3S_IRQPRIO_VSX			9
+#define BOOK3S_IRQPRIO_SYSCALL			10
+#define BOOK3S_IRQPRIO_MACHINE_CHECK		11
+#define BOOK3S_IRQPRIO_DEBUG			12
+#define BOOK3S_IRQPRIO_EXTERNAL			13
+#define BOOK3S_IRQPRIO_DECREMENTER		14
+#define BOOK3S_IRQPRIO_PERFORMANCE_MONITOR	15
+#define BOOK3S_IRQPRIO_MAX			16
+
+#define BOOK3S_HFLAG_DCBZ32			0x1
+
 #define RESUME_FLAG_NV          (1<<0)  /* Reload guest nonvolatile state? */
 #define RESUME_FLAG_HOST        (1<<1)  /* Resume host? */
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 04/27] Add Book3s fields to vcpu structs
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47   ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

We need to store more information than we currently have for vcpus
when running on Book3s.

So let's extend the internal struct definitions.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - use context_id instead of mm_context

v4 -> v5:

  - always include pvr in vcpu struct
---
 arch/powerpc/include/asm/kvm_host.h |   73 ++++++++++++++++++++++++++++++++++-
 1 files changed, 72 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index c9c930e..2cff5fe 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -37,6 +37,8 @@
 #define KVM_NR_PAGE_SIZES	1
 #define KVM_PAGES_PER_HPAGE(x)	(1UL<<31)
 
+#define HPTEG_CACHE_NUM 1024
+
 struct kvm;
 struct kvm_run;
 struct kvm_vcpu;
@@ -63,6 +65,17 @@ struct kvm_vcpu_stat {
 	u32 dec_exits;
 	u32 ext_intr_exits;
 	u32 halt_wakeup;
+#ifdef CONFIG_PPC64
+	u32 pf_storage;
+	u32 pf_instruc;
+	u32 sp_storage;
+	u32 sp_instruc;
+	u32 queue_intr;
+	u32 ld;
+	u32 ld_slow;
+	u32 st;
+	u32 st_slow;
+#endif
 };
 
 enum kvm_exit_types {
@@ -109,9 +122,53 @@ struct kvmppc_exit_timing {
 struct kvm_arch {
 };
 
+struct kvmppc_pte {
+	u64 eaddr;
+	u64 vpage;
+	u64 raddr;
+	bool may_read;
+	bool may_write;
+	bool may_execute;
+};
+
+struct kvmppc_mmu {
+	/* book3s_64 only */
+	void (*slbmte)(struct kvm_vcpu *vcpu, u64 rb, u64 rs);
+	u64  (*slbmfee)(struct kvm_vcpu *vcpu, u64 slb_nr);
+	u64  (*slbmfev)(struct kvm_vcpu *vcpu, u64 slb_nr);
+	void (*slbie)(struct kvm_vcpu *vcpu, u64 slb_nr);
+	void (*slbia)(struct kvm_vcpu *vcpu);
+	/* book3s */
+	void (*mtsrin)(struct kvm_vcpu *vcpu, u32 srnum, ulong value);
+	u32  (*mfsrin)(struct kvm_vcpu *vcpu, u32 srnum);
+	int  (*xlate)(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *pte, bool data);
+	void (*reset_msr)(struct kvm_vcpu *vcpu);
+	void (*tlbie)(struct kvm_vcpu *vcpu, ulong addr, bool large);
+	int  (*esid_to_vsid)(struct kvm_vcpu *vcpu, u64 esid, u64 *vsid);
+	u64  (*ea_to_vp)(struct kvm_vcpu *vcpu, gva_t eaddr, bool data);
+	bool (*is_dcbz32)(struct kvm_vcpu *vcpu);
+};
+
+struct hpte_cache {
+	u64 host_va;
+	u64 pfn;
+	ulong slot;
+	struct kvmppc_pte pte;
+};
+
 struct kvm_vcpu_arch {
-	u32 host_stack;
+	ulong host_stack;
 	u32 host_pid;
+#ifdef CONFIG_PPC64
+	ulong host_msr;
+	ulong host_r2;
+	void *host_retip;
+	ulong trampoline_lowmem;
+	ulong trampoline_enter;
+	ulong highmem_handler;
+	ulong host_paca_phys;
+	struct kvmppc_mmu mmu;
+#endif
 
 	u64 fpr[32];
 	ulong gpr[32];
@@ -123,6 +180,10 @@ struct kvm_vcpu_arch {
 	ulong xer;
 
 	ulong msr;
+#ifdef CONFIG_PPC64
+	ulong shadow_msr;
+	ulong hflags;
+#endif
 	u32 mmucr;
 	ulong sprg0;
 	ulong sprg1;
@@ -149,6 +210,7 @@ struct kvm_vcpu_arch {
 	u32 ivor[64];
 	ulong ivpr;
 	u32 pir;
+	u32 pvr;
 
 	u32 shadow_pid;
 	u32 pid;
@@ -174,6 +236,9 @@ struct kvm_vcpu_arch {
 #endif
 
 	u32 last_inst;
+#ifdef CONFIG_PPC64
+	ulong fault_dsisr;
+#endif
 	ulong fault_dear;
 	ulong fault_esr;
 	gpa_t paddr_accessed;
@@ -186,7 +251,13 @@ struct kvm_vcpu_arch {
 	u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */
 
 	struct timer_list dec_timer;
+	u64 dec_jiffies;
 	unsigned long pending_exceptions;
+
+#ifdef CONFIG_PPC64
+	struct hpte_cache hpte_cache[HPTEG_CACHE_NUM];
+	int hpte_cache_offset;
+#endif
 };
 
 #endif /* __POWERPC_KVM_HOST_H__ */
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 04/27] Add Book3s fields to vcpu structs
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

We need to store more information than we currently have for vcpus
when running on Book3s.

So let's extend the internal struct definitions.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - use context_id instead of mm_context

v4 -> v5:

  - always include pvr in vcpu struct
---
 arch/powerpc/include/asm/kvm_host.h |   73 ++++++++++++++++++++++++++++++++++-
 1 files changed, 72 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index c9c930e..2cff5fe 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -37,6 +37,8 @@
 #define KVM_NR_PAGE_SIZES	1
 #define KVM_PAGES_PER_HPAGE(x)	(1UL<<31)
 
+#define HPTEG_CACHE_NUM 1024
+
 struct kvm;
 struct kvm_run;
 struct kvm_vcpu;
@@ -63,6 +65,17 @@ struct kvm_vcpu_stat {
 	u32 dec_exits;
 	u32 ext_intr_exits;
 	u32 halt_wakeup;
+#ifdef CONFIG_PPC64
+	u32 pf_storage;
+	u32 pf_instruc;
+	u32 sp_storage;
+	u32 sp_instruc;
+	u32 queue_intr;
+	u32 ld;
+	u32 ld_slow;
+	u32 st;
+	u32 st_slow;
+#endif
 };
 
 enum kvm_exit_types {
@@ -109,9 +122,53 @@ struct kvmppc_exit_timing {
 struct kvm_arch {
 };
 
+struct kvmppc_pte {
+	u64 eaddr;
+	u64 vpage;
+	u64 raddr;
+	bool may_read;
+	bool may_write;
+	bool may_execute;
+};
+
+struct kvmppc_mmu {
+	/* book3s_64 only */
+	void (*slbmte)(struct kvm_vcpu *vcpu, u64 rb, u64 rs);
+	u64  (*slbmfee)(struct kvm_vcpu *vcpu, u64 slb_nr);
+	u64  (*slbmfev)(struct kvm_vcpu *vcpu, u64 slb_nr);
+	void (*slbie)(struct kvm_vcpu *vcpu, u64 slb_nr);
+	void (*slbia)(struct kvm_vcpu *vcpu);
+	/* book3s */
+	void (*mtsrin)(struct kvm_vcpu *vcpu, u32 srnum, ulong value);
+	u32  (*mfsrin)(struct kvm_vcpu *vcpu, u32 srnum);
+	int  (*xlate)(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *pte, bool data);
+	void (*reset_msr)(struct kvm_vcpu *vcpu);
+	void (*tlbie)(struct kvm_vcpu *vcpu, ulong addr, bool large);
+	int  (*esid_to_vsid)(struct kvm_vcpu *vcpu, u64 esid, u64 *vsid);
+	u64  (*ea_to_vp)(struct kvm_vcpu *vcpu, gva_t eaddr, bool data);
+	bool (*is_dcbz32)(struct kvm_vcpu *vcpu);
+};
+
+struct hpte_cache {
+	u64 host_va;
+	u64 pfn;
+	ulong slot;
+	struct kvmppc_pte pte;
+};
+
 struct kvm_vcpu_arch {
-	u32 host_stack;
+	ulong host_stack;
 	u32 host_pid;
+#ifdef CONFIG_PPC64
+	ulong host_msr;
+	ulong host_r2;
+	void *host_retip;
+	ulong trampoline_lowmem;
+	ulong trampoline_enter;
+	ulong highmem_handler;
+	ulong host_paca_phys;
+	struct kvmppc_mmu mmu;
+#endif
 
 	u64 fpr[32];
 	ulong gpr[32];
@@ -123,6 +180,10 @@ struct kvm_vcpu_arch {
 	ulong xer;
 
 	ulong msr;
+#ifdef CONFIG_PPC64
+	ulong shadow_msr;
+	ulong hflags;
+#endif
 	u32 mmucr;
 	ulong sprg0;
 	ulong sprg1;
@@ -149,6 +210,7 @@ struct kvm_vcpu_arch {
 	u32 ivor[64];
 	ulong ivpr;
 	u32 pir;
+	u32 pvr;
 
 	u32 shadow_pid;
 	u32 pid;
@@ -174,6 +236,9 @@ struct kvm_vcpu_arch {
 #endif
 
 	u32 last_inst;
+#ifdef CONFIG_PPC64
+	ulong fault_dsisr;
+#endif
 	ulong fault_dear;
 	ulong fault_esr;
 	gpa_t paddr_accessed;
@@ -186,7 +251,13 @@ struct kvm_vcpu_arch {
 	u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */
 
 	struct timer_list dec_timer;
+	u64 dec_jiffies;
 	unsigned long pending_exceptions;
+
+#ifdef CONFIG_PPC64
+	struct hpte_cache hpte_cache[HPTEG_CACHE_NUM];
+	int hpte_cache_offset;
+#endif
 };
 
 #endif /* __POWERPC_KVM_HOST_H__ */
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 04/27] Add Book3s fields to vcpu structs
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

We need to store more information than we currently have for vcpus
when running on Book3s.

So let's extend the internal struct definitions.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - use context_id instead of mm_context

v4 -> v5:

  - always include pvr in vcpu struct
---
 arch/powerpc/include/asm/kvm_host.h |   73 ++++++++++++++++++++++++++++++++++-
 1 files changed, 72 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index c9c930e..2cff5fe 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -37,6 +37,8 @@
 #define KVM_NR_PAGE_SIZES	1
 #define KVM_PAGES_PER_HPAGE(x)	(1UL<<31)
 
+#define HPTEG_CACHE_NUM 1024
+
 struct kvm;
 struct kvm_run;
 struct kvm_vcpu;
@@ -63,6 +65,17 @@ struct kvm_vcpu_stat {
 	u32 dec_exits;
 	u32 ext_intr_exits;
 	u32 halt_wakeup;
+#ifdef CONFIG_PPC64
+	u32 pf_storage;
+	u32 pf_instruc;
+	u32 sp_storage;
+	u32 sp_instruc;
+	u32 queue_intr;
+	u32 ld;
+	u32 ld_slow;
+	u32 st;
+	u32 st_slow;
+#endif
 };
 
 enum kvm_exit_types {
@@ -109,9 +122,53 @@ struct kvmppc_exit_timing {
 struct kvm_arch {
 };
 
+struct kvmppc_pte {
+	u64 eaddr;
+	u64 vpage;
+	u64 raddr;
+	bool may_read;
+	bool may_write;
+	bool may_execute;
+};
+
+struct kvmppc_mmu {
+	/* book3s_64 only */
+	void (*slbmte)(struct kvm_vcpu *vcpu, u64 rb, u64 rs);
+	u64  (*slbmfee)(struct kvm_vcpu *vcpu, u64 slb_nr);
+	u64  (*slbmfev)(struct kvm_vcpu *vcpu, u64 slb_nr);
+	void (*slbie)(struct kvm_vcpu *vcpu, u64 slb_nr);
+	void (*slbia)(struct kvm_vcpu *vcpu);
+	/* book3s */
+	void (*mtsrin)(struct kvm_vcpu *vcpu, u32 srnum, ulong value);
+	u32  (*mfsrin)(struct kvm_vcpu *vcpu, u32 srnum);
+	int  (*xlate)(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *pte, bool data);
+	void (*reset_msr)(struct kvm_vcpu *vcpu);
+	void (*tlbie)(struct kvm_vcpu *vcpu, ulong addr, bool large);
+	int  (*esid_to_vsid)(struct kvm_vcpu *vcpu, u64 esid, u64 *vsid);
+	u64  (*ea_to_vp)(struct kvm_vcpu *vcpu, gva_t eaddr, bool data);
+	bool (*is_dcbz32)(struct kvm_vcpu *vcpu);
+};
+
+struct hpte_cache {
+	u64 host_va;
+	u64 pfn;
+	ulong slot;
+	struct kvmppc_pte pte;
+};
+
 struct kvm_vcpu_arch {
-	u32 host_stack;
+	ulong host_stack;
 	u32 host_pid;
+#ifdef CONFIG_PPC64
+	ulong host_msr;
+	ulong host_r2;
+	void *host_retip;
+	ulong trampoline_lowmem;
+	ulong trampoline_enter;
+	ulong highmem_handler;
+	ulong host_paca_phys;
+	struct kvmppc_mmu mmu;
+#endif
 
 	u64 fpr[32];
 	ulong gpr[32];
@@ -123,6 +180,10 @@ struct kvm_vcpu_arch {
 	ulong xer;
 
 	ulong msr;
+#ifdef CONFIG_PPC64
+	ulong shadow_msr;
+	ulong hflags;
+#endif
 	u32 mmucr;
 	ulong sprg0;
 	ulong sprg1;
@@ -149,6 +210,7 @@ struct kvm_vcpu_arch {
 	u32 ivor[64];
 	ulong ivpr;
 	u32 pir;
+	u32 pvr;
 
 	u32 shadow_pid;
 	u32 pid;
@@ -174,6 +236,9 @@ struct kvm_vcpu_arch {
 #endif
 
 	u32 last_inst;
+#ifdef CONFIG_PPC64
+	ulong fault_dsisr;
+#endif
 	ulong fault_dear;
 	ulong fault_esr;
 	gpa_t paddr_accessed;
@@ -186,7 +251,13 @@ struct kvm_vcpu_arch {
 	u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */
 
 	struct timer_list dec_timer;
+	u64 dec_jiffies;
 	unsigned long pending_exceptions;
+
+#ifdef CONFIG_PPC64
+	struct hpte_cache hpte_cache[HPTEG_CACHE_NUM];
+	int hpte_cache_offset;
+#endif
 };
 
 #endif /* __POWERPC_KVM_HOST_H__ */
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 05/27] Add asm/kvm_book3s.h
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47   ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

This adds the book3s specific header file that contains structs that
are only valid on book3s specific code.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - use context_id instead of mm_alloc
---
 arch/powerpc/include/asm/kvm_book3s.h |  136 +++++++++++++++++++++++++++++++++
 1 files changed, 136 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s.h

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
new file mode 100644
index 0000000..c601133
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -0,0 +1,136 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#ifndef __ASM_KVM_BOOK3S_H__
+#define __ASM_KVM_BOOK3S_H__
+
+#include <linux/types.h>
+#include <linux/kvm_host.h>
+#include <asm/kvm_ppc.h>
+
+struct kvmppc_slb {
+	u64 esid;
+	u64 vsid;
+	u64 orige;
+	u64 origv;
+	bool valid;
+	bool Ks;
+	bool Kp;
+	bool nx;
+	bool large;
+	bool class;
+};
+
+struct kvmppc_sr {
+	u32 raw;
+	u32 vsid;
+	bool Ks;
+	bool Kp;
+	bool nx;
+};
+
+struct kvmppc_bat {
+	u32 bepi;
+	u32 bepi_mask;
+	bool vs;
+	bool vp;
+	u32 brpn;
+	u8 wimg;
+	u8 pp;
+};
+
+struct kvmppc_sid_map {
+	u64 guest_vsid;
+	u64 guest_esid;
+	u64 host_vsid;
+	bool valid;
+};
+
+#define SID_MAP_BITS    9
+#define SID_MAP_NUM     (1 << SID_MAP_BITS)
+#define SID_MAP_MASK    (SID_MAP_NUM - 1)
+
+struct kvmppc_vcpu_book3s {
+	struct kvm_vcpu vcpu;
+	struct kvmppc_sid_map sid_map[SID_MAP_NUM];
+	struct kvmppc_slb slb[64];
+	struct {
+		u64 esid;
+		u64 vsid;
+	} slb_shadow[64];
+	u8 slb_shadow_max;
+	struct kvmppc_sr sr[16];
+	struct kvmppc_bat ibat[8];
+	struct kvmppc_bat dbat[8];
+	u64 hid[6];
+	int slb_nr;
+	u64 sdr1;
+	u64 dsisr;
+	u64 hior;
+	u64 msr_mask;
+	u64 vsid_first;
+	u64 vsid_next;
+	u64 vsid_max;
+	int context_id;
+};
+
+#define CONTEXT_HOST		0
+#define CONTEXT_GUEST		1
+#define CONTEXT_GUEST_END	2
+
+#define VSID_REAL	0xfffffffffff00000
+#define VSID_REAL_DR	0xffffffffffe00000
+#define VSID_REAL_IR	0xffffffffffd00000
+#define VSID_BAT	0xffffffffffc00000
+#define VSID_PR		0x8000000000000000
+
+extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, u64 ea, u64 ea_mask);
+extern void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 vp, u64 vp_mask);
+extern void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, u64 pa_start, u64 pa_end);
+extern void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 new_msr);
+extern void kvmppc_mmu_book3s_64_init(struct kvm_vcpu *vcpu);
+extern void kvmppc_mmu_book3s_32_init(struct kvm_vcpu *vcpu);
+extern int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte);
+extern int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr);
+extern void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu);
+extern struct kvmppc_pte *kvmppc_mmu_find_pte(struct kvm_vcpu *vcpu, u64 ea, bool data);
+extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong eaddr, int size, void *ptr, bool data);
+extern int kvmppc_st(struct kvm_vcpu *vcpu, ulong eaddr, int size, void *ptr);
+extern void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec);
+
+extern u32 kvmppc_trampoline_lowmem;
+extern u32 kvmppc_trampoline_enter;
+
+static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
+{
+	return container_of(vcpu, struct kvmppc_vcpu_book3s, vcpu);
+}
+
+static inline ulong dsisr(void)
+{
+	ulong r;
+	asm ( "mfdsisr %0 " : "=r" (r) );
+	return r;
+}
+
+extern void kvm_return_point(void);
+
+#define INS_DCBZ			0x7c0007ec
+
+#endif /* __ASM_KVM_BOOK3S_H__ */
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 05/27] Add asm/kvm_book3s.h
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

This adds the book3s specific header file that contains structs that
are only valid on book3s specific code.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - use context_id instead of mm_alloc
---
 arch/powerpc/include/asm/kvm_book3s.h |  136 +++++++++++++++++++++++++++++++++
 1 files changed, 136 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s.h

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
new file mode 100644
index 0000000..c601133
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -0,0 +1,136 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#ifndef __ASM_KVM_BOOK3S_H__
+#define __ASM_KVM_BOOK3S_H__
+
+#include <linux/types.h>
+#include <linux/kvm_host.h>
+#include <asm/kvm_ppc.h>
+
+struct kvmppc_slb {
+	u64 esid;
+	u64 vsid;
+	u64 orige;
+	u64 origv;
+	bool valid;
+	bool Ks;
+	bool Kp;
+	bool nx;
+	bool large;
+	bool class;
+};
+
+struct kvmppc_sr {
+	u32 raw;
+	u32 vsid;
+	bool Ks;
+	bool Kp;
+	bool nx;
+};
+
+struct kvmppc_bat {
+	u32 bepi;
+	u32 bepi_mask;
+	bool vs;
+	bool vp;
+	u32 brpn;
+	u8 wimg;
+	u8 pp;
+};
+
+struct kvmppc_sid_map {
+	u64 guest_vsid;
+	u64 guest_esid;
+	u64 host_vsid;
+	bool valid;
+};
+
+#define SID_MAP_BITS    9
+#define SID_MAP_NUM     (1 << SID_MAP_BITS)
+#define SID_MAP_MASK    (SID_MAP_NUM - 1)
+
+struct kvmppc_vcpu_book3s {
+	struct kvm_vcpu vcpu;
+	struct kvmppc_sid_map sid_map[SID_MAP_NUM];
+	struct kvmppc_slb slb[64];
+	struct {
+		u64 esid;
+		u64 vsid;
+	} slb_shadow[64];
+	u8 slb_shadow_max;
+	struct kvmppc_sr sr[16];
+	struct kvmppc_bat ibat[8];
+	struct kvmppc_bat dbat[8];
+	u64 hid[6];
+	int slb_nr;
+	u64 sdr1;
+	u64 dsisr;
+	u64 hior;
+	u64 msr_mask;
+	u64 vsid_first;
+	u64 vsid_next;
+	u64 vsid_max;
+	int context_id;
+};
+
+#define CONTEXT_HOST		0
+#define CONTEXT_GUEST		1
+#define CONTEXT_GUEST_END	2
+
+#define VSID_REAL	0xfffffffffff00000
+#define VSID_REAL_DR	0xffffffffffe00000
+#define VSID_REAL_IR	0xffffffffffd00000
+#define VSID_BAT	0xffffffffffc00000
+#define VSID_PR		0x8000000000000000
+
+extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, u64 ea, u64 ea_mask);
+extern void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 vp, u64 vp_mask);
+extern void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, u64 pa_start, u64 pa_end);
+extern void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 new_msr);
+extern void kvmppc_mmu_book3s_64_init(struct kvm_vcpu *vcpu);
+extern void kvmppc_mmu_book3s_32_init(struct kvm_vcpu *vcpu);
+extern int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte);
+extern int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr);
+extern void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu);
+extern struct kvmppc_pte *kvmppc_mmu_find_pte(struct kvm_vcpu *vcpu, u64 ea, bool data);
+extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong eaddr, int size, void *ptr, bool data);
+extern int kvmppc_st(struct kvm_vcpu *vcpu, ulong eaddr, int size, void *ptr);
+extern void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec);
+
+extern u32 kvmppc_trampoline_lowmem;
+extern u32 kvmppc_trampoline_enter;
+
+static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
+{
+	return container_of(vcpu, struct kvmppc_vcpu_book3s, vcpu);
+}
+
+static inline ulong dsisr(void)
+{
+	ulong r;
+	asm ( "mfdsisr %0 " : "=r" (r) );
+	return r;
+}
+
+extern void kvm_return_point(void);
+
+#define INS_DCBZ			0x7c0007ec
+
+#endif /* __ASM_KVM_BOOK3S_H__ */
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 05/27] Add asm/kvm_book3s.h
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

This adds the book3s specific header file that contains structs that
are only valid on book3s specific code.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - use context_id instead of mm_alloc
---
 arch/powerpc/include/asm/kvm_book3s.h |  136 +++++++++++++++++++++++++++++++++
 1 files changed, 136 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s.h

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
new file mode 100644
index 0000000..c601133
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -0,0 +1,136 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#ifndef __ASM_KVM_BOOK3S_H__
+#define __ASM_KVM_BOOK3S_H__
+
+#include <linux/types.h>
+#include <linux/kvm_host.h>
+#include <asm/kvm_ppc.h>
+
+struct kvmppc_slb {
+	u64 esid;
+	u64 vsid;
+	u64 orige;
+	u64 origv;
+	bool valid;
+	bool Ks;
+	bool Kp;
+	bool nx;
+	bool large;
+	bool class;
+};
+
+struct kvmppc_sr {
+	u32 raw;
+	u32 vsid;
+	bool Ks;
+	bool Kp;
+	bool nx;
+};
+
+struct kvmppc_bat {
+	u32 bepi;
+	u32 bepi_mask;
+	bool vs;
+	bool vp;
+	u32 brpn;
+	u8 wimg;
+	u8 pp;
+};
+
+struct kvmppc_sid_map {
+	u64 guest_vsid;
+	u64 guest_esid;
+	u64 host_vsid;
+	bool valid;
+};
+
+#define SID_MAP_BITS    9
+#define SID_MAP_NUM     (1 << SID_MAP_BITS)
+#define SID_MAP_MASK    (SID_MAP_NUM - 1)
+
+struct kvmppc_vcpu_book3s {
+	struct kvm_vcpu vcpu;
+	struct kvmppc_sid_map sid_map[SID_MAP_NUM];
+	struct kvmppc_slb slb[64];
+	struct {
+		u64 esid;
+		u64 vsid;
+	} slb_shadow[64];
+	u8 slb_shadow_max;
+	struct kvmppc_sr sr[16];
+	struct kvmppc_bat ibat[8];
+	struct kvmppc_bat dbat[8];
+	u64 hid[6];
+	int slb_nr;
+	u64 sdr1;
+	u64 dsisr;
+	u64 hior;
+	u64 msr_mask;
+	u64 vsid_first;
+	u64 vsid_next;
+	u64 vsid_max;
+	int context_id;
+};
+
+#define CONTEXT_HOST		0
+#define CONTEXT_GUEST		1
+#define CONTEXT_GUEST_END	2
+
+#define VSID_REAL	0xfffffffffff00000
+#define VSID_REAL_DR	0xffffffffffe00000
+#define VSID_REAL_IR	0xffffffffffd00000
+#define VSID_BAT	0xffffffffffc00000
+#define VSID_PR		0x8000000000000000
+
+extern void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, u64 ea, u64 ea_mask);
+extern void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 vp, u64 vp_mask);
+extern void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, u64 pa_start, u64 pa_end);
+extern void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 new_msr);
+extern void kvmppc_mmu_book3s_64_init(struct kvm_vcpu *vcpu);
+extern void kvmppc_mmu_book3s_32_init(struct kvm_vcpu *vcpu);
+extern int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte);
+extern int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr);
+extern void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu);
+extern struct kvmppc_pte *kvmppc_mmu_find_pte(struct kvm_vcpu *vcpu, u64 ea, bool data);
+extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong eaddr, int size, void *ptr, bool data);
+extern int kvmppc_st(struct kvm_vcpu *vcpu, ulong eaddr, int size, void *ptr);
+extern void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec);
+
+extern u32 kvmppc_trampoline_lowmem;
+extern u32 kvmppc_trampoline_enter;
+
+static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
+{
+	return container_of(vcpu, struct kvmppc_vcpu_book3s, vcpu);
+}
+
+static inline ulong dsisr(void)
+{
+	ulong r;
+	asm ( "mfdsisr %0 " : "=r" (r) );
+	return r;
+}
+
+extern void kvm_return_point(void);
+
+#define INS_DCBZ			0x7c0007ec
+
+#endif /* __ASM_KVM_BOOK3S_H__ */
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 06/27] Add Book3s_64 intercept helpers
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47   ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

We need to intercept interrupt vectors. To do that, let's add a file
we can always include which only activates the intercepts when we have
then configured.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/kvm_book3s_64_asm.h |   58 ++++++++++++++++++++++++++
 1 files changed, 58 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_64_asm.h

diff --git a/arch/powerpc/include/asm/kvm_book3s_64_asm.h b/arch/powerpc/include/asm/kvm_book3s_64_asm.h
new file mode 100644
index 0000000..2e06ee8
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_book3s_64_asm.h
@@ -0,0 +1,58 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#ifndef __ASM_KVM_BOOK3S_ASM_H__
+#define __ASM_KVM_BOOK3S_ASM_H__
+
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+
+#include <asm/kvm_asm.h>
+
+.macro DO_KVM intno
+	.if (\intno == BOOK3S_INTERRUPT_SYSTEM_RESET) || \
+	    (\intno == BOOK3S_INTERRUPT_MACHINE_CHECK) || \
+	    (\intno == BOOK3S_INTERRUPT_DATA_STORAGE) || \
+	    (\intno == BOOK3S_INTERRUPT_INST_STORAGE) || \
+	    (\intno == BOOK3S_INTERRUPT_DATA_SEGMENT) || \
+	    (\intno == BOOK3S_INTERRUPT_INST_SEGMENT) || \
+	    (\intno == BOOK3S_INTERRUPT_EXTERNAL) || \
+	    (\intno == BOOK3S_INTERRUPT_ALIGNMENT) || \
+	    (\intno == BOOK3S_INTERRUPT_PROGRAM) || \
+	    (\intno == BOOK3S_INTERRUPT_FP_UNAVAIL) || \
+	    (\intno == BOOK3S_INTERRUPT_DECREMENTER) || \
+	    (\intno == BOOK3S_INTERRUPT_SYSCALL) || \
+	    (\intno == BOOK3S_INTERRUPT_TRACE) || \
+	    (\intno == BOOK3S_INTERRUPT_PERFMON) || \
+	    (\intno == BOOK3S_INTERRUPT_ALTIVEC) || \
+	    (\intno == BOOK3S_INTERRUPT_VSX)
+
+	b	kvmppc_trampoline_\intno
+kvmppc_resume_\intno:
+
+	.endif
+.endm
+
+#else
+
+.macro DO_KVM intno
+.endm
+
+#endif /* CONFIG_KVM_BOOK3S_64_HANDLER */
+
+#endif /* __ASM_KVM_BOOK3S_ASM_H__ */
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 06/27] Add Book3s_64 intercept helpers
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

We need to intercept interrupt vectors. To do that, let's add a file
we can always include which only activates the intercepts when we have
then configured.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/kvm_book3s_64_asm.h |   58 ++++++++++++++++++++++++++
 1 files changed, 58 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_64_asm.h

diff --git a/arch/powerpc/include/asm/kvm_book3s_64_asm.h b/arch/powerpc/include/asm/kvm_book3s_64_asm.h
new file mode 100644
index 0000000..2e06ee8
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_book3s_64_asm.h
@@ -0,0 +1,58 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#ifndef __ASM_KVM_BOOK3S_ASM_H__
+#define __ASM_KVM_BOOK3S_ASM_H__
+
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+
+#include <asm/kvm_asm.h>
+
+.macro DO_KVM intno
+	.if (\intno == BOOK3S_INTERRUPT_SYSTEM_RESET) || \
+	    (\intno == BOOK3S_INTERRUPT_MACHINE_CHECK) || \
+	    (\intno == BOOK3S_INTERRUPT_DATA_STORAGE) || \
+	    (\intno == BOOK3S_INTERRUPT_INST_STORAGE) || \
+	    (\intno == BOOK3S_INTERRUPT_DATA_SEGMENT) || \
+	    (\intno == BOOK3S_INTERRUPT_INST_SEGMENT) || \
+	    (\intno == BOOK3S_INTERRUPT_EXTERNAL) || \
+	    (\intno == BOOK3S_INTERRUPT_ALIGNMENT) || \
+	    (\intno == BOOK3S_INTERRUPT_PROGRAM) || \
+	    (\intno == BOOK3S_INTERRUPT_FP_UNAVAIL) || \
+	    (\intno == BOOK3S_INTERRUPT_DECREMENTER) || \
+	    (\intno == BOOK3S_INTERRUPT_SYSCALL) || \
+	    (\intno == BOOK3S_INTERRUPT_TRACE) || \
+	    (\intno == BOOK3S_INTERRUPT_PERFMON) || \
+	    (\intno == BOOK3S_INTERRUPT_ALTIVEC) || \
+	    (\intno == BOOK3S_INTERRUPT_VSX)
+
+	b	kvmppc_trampoline_\intno
+kvmppc_resume_\intno:
+
+	.endif
+.endm
+
+#else
+
+.macro DO_KVM intno
+.endm
+
+#endif /* CONFIG_KVM_BOOK3S_64_HANDLER */
+
+#endif /* __ASM_KVM_BOOK3S_ASM_H__ */
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 06/27] Add Book3s_64 intercept helpers
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

We need to intercept interrupt vectors. To do that, let's add a file
we can always include which only activates the intercepts when we have
then configured.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/kvm_book3s_64_asm.h |   58 ++++++++++++++++++++++++++
 1 files changed, 58 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_64_asm.h

diff --git a/arch/powerpc/include/asm/kvm_book3s_64_asm.h b/arch/powerpc/include/asm/kvm_book3s_64_asm.h
new file mode 100644
index 0000000..2e06ee8
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_book3s_64_asm.h
@@ -0,0 +1,58 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#ifndef __ASM_KVM_BOOK3S_ASM_H__
+#define __ASM_KVM_BOOK3S_ASM_H__
+
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+
+#include <asm/kvm_asm.h>
+
+.macro DO_KVM intno
+	.if (\intno = BOOK3S_INTERRUPT_SYSTEM_RESET) || \
+	    (\intno = BOOK3S_INTERRUPT_MACHINE_CHECK) || \
+	    (\intno = BOOK3S_INTERRUPT_DATA_STORAGE) || \
+	    (\intno = BOOK3S_INTERRUPT_INST_STORAGE) || \
+	    (\intno = BOOK3S_INTERRUPT_DATA_SEGMENT) || \
+	    (\intno = BOOK3S_INTERRUPT_INST_SEGMENT) || \
+	    (\intno = BOOK3S_INTERRUPT_EXTERNAL) || \
+	    (\intno = BOOK3S_INTERRUPT_ALIGNMENT) || \
+	    (\intno = BOOK3S_INTERRUPT_PROGRAM) || \
+	    (\intno = BOOK3S_INTERRUPT_FP_UNAVAIL) || \
+	    (\intno = BOOK3S_INTERRUPT_DECREMENTER) || \
+	    (\intno = BOOK3S_INTERRUPT_SYSCALL) || \
+	    (\intno = BOOK3S_INTERRUPT_TRACE) || \
+	    (\intno = BOOK3S_INTERRUPT_PERFMON) || \
+	    (\intno = BOOK3S_INTERRUPT_ALTIVEC) || \
+	    (\intno = BOOK3S_INTERRUPT_VSX)
+
+	b	kvmppc_trampoline_\intno
+kvmppc_resume_\intno:
+
+	.endif
+.endm
+
+#else
+
+.macro DO_KVM intno
+.endm
+
+#endif /* CONFIG_KVM_BOOK3S_64_HANDLER */
+
+#endif /* __ASM_KVM_BOOK3S_ASM_H__ */
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 07/27] Add book3s_64 highmem asm code
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47     ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

This is the of entry / exit code. In order to switch between host and guest
context, we need to switch register state and call the exit code handler on
exit.

This assembly file does exactly that. To finally enter the guest it calls
into book3s_64_slb.S. On exit it gets jumped at from book3s_64_slb.S too.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>

---

v3 -> v4:

  - header rename fix
---
 arch/powerpc/include/asm/kvm_ppc.h      |    1 +
 arch/powerpc/kvm/book3s_64_interrupts.S |  392 +++++++++++++++++++++++++++++++
 2 files changed, 393 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_interrupts.S

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 2c6ee34..269ee46 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -39,6 +39,7 @@ enum emulation_result {
 extern int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
 extern char kvmppc_handlers_start[];
 extern unsigned long kvmppc_handler_len;
+extern void kvmppc_handler_highmem(void);
 
 extern void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu);
 extern int kvmppc_handle_load(struct kvm_run *run, struct kvm_vcpu *vcpu,
diff --git a/arch/powerpc/kvm/book3s_64_interrupts.S b/arch/powerpc/kvm/book3s_64_interrupts.S
new file mode 100644
index 0000000..7b55d80
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_interrupts.S
@@ -0,0 +1,392 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
+ */
+
+#include <asm/ppc_asm.h>
+#include <asm/kvm_asm.h>
+#include <asm/reg.h>
+#include <asm/page.h>
+#include <asm/asm-offsets.h>
+#include <asm/exception-64s.h>
+
+#define KVMPPC_HANDLE_EXIT .kvmppc_handle_exit
+#define ULONG_SIZE 8
+#define VCPU_GPR(n)     (VCPU_GPRS + (n * ULONG_SIZE))
+
+.macro mfpaca tmp_reg, src_reg, offset, vcpu_reg
+	ld	\tmp_reg, (PACA_EXMC+\offset)(r13)
+	std	\tmp_reg, VCPU_GPR(\src_reg)(\vcpu_reg)
+.endm
+
+.macro DISABLE_INTERRUPTS
+       mfmsr   r0
+       rldicl  r0,r0,48,1
+       rotldi  r0,r0,16
+       mtmsrd  r0,1
+.endm
+
+/*****************************************************************************
+ *                                                                           *
+ *     Guest entry / exit code that is in kernel module memory (highmem)     *
+ *                                                                           *
+ ****************************************************************************/
+
+/* Registers:
+ *  r3: kvm_run pointer
+ *  r4: vcpu pointer
+ */
+_GLOBAL(__kvmppc_vcpu_entry)
+
+kvm_start_entry:
+	/* Write correct stack frame */
+	mflr    r0
+	std     r0,16(r1)
+
+	/* Save host state to the stack */
+	stdu	r1, -SWITCH_FRAME_SIZE(r1)
+
+	/* Save r3 (kvm_run) and r4 (vcpu) */
+	SAVE_2GPRS(3, r1)
+
+	/* Save non-volatile registers (r14 - r31) */
+	SAVE_NVGPRS(r1)
+
+	/* Save LR */
+	mflr	r14
+	std	r14, _LINK(r1)
+
+/* XXX optimize non-volatile loading away */
+kvm_start_lightweight:
+
+	DISABLE_INTERRUPTS
+
+	/* Save R1/R2 in the PACA */
+	std	r1, PACAR1(r13)
+	std	r2, (PACA_EXMC+EX_SRR0)(r13)
+	ld	r3, VCPU_HIGHMEM_HANDLER(r4)
+	std	r3, PACASAVEDMSR(r13)
+
+	/* Load non-volatile guest state from the vcpu */
+	ld	r14, VCPU_GPR(r14)(r4)
+	ld	r15, VCPU_GPR(r15)(r4)
+	ld	r16, VCPU_GPR(r16)(r4)
+	ld	r17, VCPU_GPR(r17)(r4)
+	ld	r18, VCPU_GPR(r18)(r4)
+	ld	r19, VCPU_GPR(r19)(r4)
+	ld	r20, VCPU_GPR(r20)(r4)
+	ld	r21, VCPU_GPR(r21)(r4)
+	ld	r22, VCPU_GPR(r22)(r4)
+	ld	r23, VCPU_GPR(r23)(r4)
+	ld	r24, VCPU_GPR(r24)(r4)
+	ld	r25, VCPU_GPR(r25)(r4)
+	ld	r26, VCPU_GPR(r26)(r4)
+	ld	r27, VCPU_GPR(r27)(r4)
+	ld	r28, VCPU_GPR(r28)(r4)
+	ld	r29, VCPU_GPR(r29)(r4)
+	ld	r30, VCPU_GPR(r30)(r4)
+	ld	r31, VCPU_GPR(r31)(r4)
+
+	ld	r9, VCPU_PC(r4)			/* r9 = vcpu->arch.pc */
+	ld	r10, VCPU_SHADOW_MSR(r4)	/* r10 = vcpu->arch.shadow_msr */
+
+	ld	r3, VCPU_TRAMPOLINE_ENTER(r4)
+	mtsrr0	r3
+
+	LOAD_REG_IMMEDIATE(r3, MSR_KERNEL & ~(MSR_IR | MSR_DR))
+	mtsrr1	r3
+
+	/* Load guest state in the respective registers */
+	lwz	r3, VCPU_CR(r4)		/* r3 = vcpu->arch.cr */
+	stw	r3, (PACA_EXMC + EX_CCR)(r13)
+
+	ld	r3, VCPU_CTR(r4)	/* r3 = vcpu->arch.ctr */
+	mtctr	r3			/* CTR = r3 */
+
+	ld	r3, VCPU_LR(r4)		/* r3 = vcpu->arch.lr */
+	mtlr	r3			/* LR = r3 */
+
+	ld	r3, VCPU_XER(r4)	/* r3 = vcpu->arch.xer */
+	std	r3, (PACA_EXMC + EX_R3)(r13)
+
+	/* Some guests may need to have dcbz set to 32 byte length.
+	 *
+	 * Usually we ensure that by patching the guest's instructions
+	 * to trap on dcbz and emulate it in the hypervisor.
+	 *
+	 * If we can, we should tell the CPU to use 32 byte dcbz though,
+	 * because that's a lot faster.
+	 */
+
+	ld	r3, VCPU_HFLAGS(r4)
+	rldicl.	r3, r3, 0, 63		/* CR = ((r3 & 1) == 0) */
+	beq	no_dcbz32_on
+
+	mfspr   r3,SPRN_HID5
+	ori     r3, r3, 0x80		/* XXX HID5_dcbz32 = 0x80 */
+	mtspr   SPRN_HID5,r3
+
+no_dcbz32_on:
+	/*	Load guest GPRs */
+
+	ld	r3, VCPU_GPR(r9)(r4)
+	std	r3, (PACA_EXMC + EX_R9)(r13)
+	ld	r3, VCPU_GPR(r10)(r4)
+	std	r3, (PACA_EXMC + EX_R10)(r13)
+	ld	r3, VCPU_GPR(r11)(r4)
+	std	r3, (PACA_EXMC + EX_R11)(r13)
+	ld	r3, VCPU_GPR(r12)(r4)
+	std	r3, (PACA_EXMC + EX_R12)(r13)
+	ld	r3, VCPU_GPR(r13)(r4)
+	std	r3, (PACA_EXMC + EX_R13)(r13)
+
+	ld	r0, VCPU_GPR(r0)(r4)
+	ld	r1, VCPU_GPR(r1)(r4)
+	ld	r2, VCPU_GPR(r2)(r4)
+	ld	r3, VCPU_GPR(r3)(r4)
+	ld	r5, VCPU_GPR(r5)(r4)
+	ld	r6, VCPU_GPR(r6)(r4)
+	ld	r7, VCPU_GPR(r7)(r4)
+	ld	r8, VCPU_GPR(r8)(r4)
+	ld	r4, VCPU_GPR(r4)(r4)
+
+	/* This sets the Magic value for the trampoline */
+
+	li	r11, 1
+	stb	r11, PACA_KVM_IN_GUEST(r13)
+
+	/* Jump to SLB patching handlder and into our guest */
+	RFI
+
+/*
+ * This is the handler in module memory. It gets jumped at from the
+ * lowmem trampoline code, so it's basically the guest exit code.
+ *
+ */
+
+.global kvmppc_handler_highmem
+kvmppc_handler_highmem:
+
+	/*
+	 * Register usage at this point:
+	 *
+	 * R00   = guest R13
+	 * R01   = host R1
+	 * R02   = host R2
+	 * R10   = guest PC
+	 * R11   = guest MSR
+	 * R12   = exit handler id
+	 * R13   = PACA
+	 * PACA.exmc.R9    = guest R1
+	 * PACA.exmc.R10   = guest R10
+	 * PACA.exmc.R11   = guest R11
+	 * PACA.exmc.R12   = guest R12
+	 * PACA.exmc.R13   = guest R2
+	 * PACA.exmc.DAR   = guest DAR
+	 * PACA.exmc.DSISR = guest DSISR
+	 * PACA.exmc.LR    = guest instruction
+	 * PACA.exmc.CCR   = guest CR
+	 * PACA.exmc.SRR0  = guest R0
+	 *
+	 */
+
+	std	r3, (PACA_EXMC+EX_R3)(r13)
+
+	/* save the exit id in R3 */
+	mr	r3, r12
+
+	/* R12 = vcpu */
+	ld	r12, GPR4(r1)
+
+	/* Now save the guest state */
+
+	std	r0, VCPU_GPR(r13)(r12)
+	std	r4, VCPU_GPR(r4)(r12)
+	std	r5, VCPU_GPR(r5)(r12)
+	std	r6, VCPU_GPR(r6)(r12)
+	std	r7, VCPU_GPR(r7)(r12)
+	std	r8, VCPU_GPR(r8)(r12)
+	std	r9, VCPU_GPR(r9)(r12)
+
+	/* get registers from PACA */
+	mfpaca	r5, r0, EX_SRR0, r12
+	mfpaca	r5, r3, EX_R3, r12
+	mfpaca	r5, r1, EX_R9, r12
+	mfpaca	r5, r10, EX_R10, r12
+	mfpaca	r5, r11, EX_R11, r12
+	mfpaca	r5, r12, EX_R12, r12
+	mfpaca	r5, r2, EX_R13, r12
+
+	lwz	r5, (PACA_EXMC+EX_LR)(r13)
+	stw	r5, VCPU_LAST_INST(r12)
+
+	lwz	r5, (PACA_EXMC+EX_CCR)(r13)
+	stw	r5, VCPU_CR(r12)
+
+	ld	r5, VCPU_HFLAGS(r12)
+	rldicl.	r5, r5, 0, 63		/* CR = ((r5 & 1) == 0) */
+	beq	no_dcbz32_off
+
+	mfspr   r5,SPRN_HID5
+	rldimi  r5,r5,6,56
+	mtspr   SPRN_HID5,r5
+
+no_dcbz32_off:
+
+	/* XXX maybe skip on lightweight? */
+	std	r14, VCPU_GPR(r14)(r12)
+	std	r15, VCPU_GPR(r15)(r12)
+	std	r16, VCPU_GPR(r16)(r12)
+	std	r17, VCPU_GPR(r17)(r12)
+	std	r18, VCPU_GPR(r18)(r12)
+	std	r19, VCPU_GPR(r19)(r12)
+	std	r20, VCPU_GPR(r20)(r12)
+	std	r21, VCPU_GPR(r21)(r12)
+	std	r22, VCPU_GPR(r22)(r12)
+	std	r23, VCPU_GPR(r23)(r12)
+	std	r24, VCPU_GPR(r24)(r12)
+	std	r25, VCPU_GPR(r25)(r12)
+	std	r26, VCPU_GPR(r26)(r12)
+	std	r27, VCPU_GPR(r27)(r12)
+	std	r28, VCPU_GPR(r28)(r12)
+	std	r29, VCPU_GPR(r29)(r12)
+	std	r30, VCPU_GPR(r30)(r12)
+	std	r31, VCPU_GPR(r31)(r12)
+
+	/* Restore non-volatile host registers (r14 - r31) */
+	REST_NVGPRS(r1)
+
+	/* Save guest PC (R10) */
+	std	r10, VCPU_PC(r12)
+
+	/* Save guest msr (R11) */
+	std	r11, VCPU_SHADOW_MSR(r12)
+
+	/* Save guest CTR (in R12) */
+	mfctr	r5
+	std	r5, VCPU_CTR(r12)
+
+	/* Save guest LR */
+	mflr	r5
+	std	r5, VCPU_LR(r12)
+
+	/* Save guest XER */
+	mfxer	r5
+	std	r5, VCPU_XER(r12)
+
+	/* Save guest DAR */
+	ld	r5, (PACA_EXMC+EX_DAR)(r13)
+	std	r5, VCPU_FAULT_DEAR(r12)
+
+	/* Save guest DSISR */
+	lwz	r5, (PACA_EXMC+EX_DSISR)(r13)
+	std	r5, VCPU_FAULT_DSISR(r12)
+
+	/* Restore host msr -> SRR1 */
+	ld	r7, VCPU_HOST_MSR(r12)
+	mtsrr1	r7
+
+	/* Restore host IP -> SRR0 */
+	ld	r6, VCPU_HOST_RETIP(r12)
+	mtsrr0	r6
+
+	/*
+	 * For some interrupts, we need to call the real Linux
+	 * handler, so it can do work for us. This has to happen
+	 * as if the interrupt arrived from the kernel though,
+	 * so let's fake it here where most state is restored.
+	 *
+	 * Call Linux for hardware interrupts/decrementer
+	 * r3 = address of interrupt handler (exit reason)
+	 */
+
+	cmpwi	r3, BOOK3S_INTERRUPT_EXTERNAL
+	beq	call_linux_handler
+	cmpwi	r3, BOOK3S_INTERRUPT_DECREMENTER
+	beq	call_linux_handler
+
+	/* Back to Interruptable Mode! (goto kvm_return_point) */
+	RFI
+
+call_linux_handler:
+
+	/*
+	 * If we land here we need to jump back to the handler we
+	 * came from.
+	 *
+	 * We have a page that we can access from real mode, so let's
+	 * jump back to that and use it as a trampoline to get back into the
+	 * interrupt handler!
+	 *
+	 * R3 still contains the exit code,
+	 * R6 VCPU_HOST_RETIP and
+	 * R7 VCPU_HOST_MSR
+	 */
+
+	mtlr	r3
+
+	ld	r5, VCPU_TRAMPOLINE_LOWMEM(r12)
+	mtsrr0	r5
+	LOAD_REG_IMMEDIATE(r5, MSR_KERNEL & ~(MSR_IR | MSR_DR))
+	mtsrr1	r5
+
+	RFI
+
+.global kvm_return_point
+kvm_return_point:
+
+	/* Jump back to lightweight entry if we're supposed to */
+	/* go back into the guest */
+	mr	r5, r3
+	/* Restore r3 (kvm_run) and r4 (vcpu) */
+	REST_2GPRS(3, r1)
+	bl	KVMPPC_HANDLE_EXIT
+
+#if 0 /* XXX get lightweight exits back */
+	cmpwi	r3, RESUME_GUEST
+	bne	kvm_exit_heavyweight
+
+	/* put VCPU and KVM_RUN back into place and roll again! */
+	REST_2GPRS(3, r1)
+	b	kvm_start_lightweight
+
+kvm_exit_heavyweight:
+	/* Restore non-volatile host registers */
+	ld	r14, _LINK(r1)
+	mtlr	r14
+	REST_NVGPRS(r1)
+
+	addi    r1, r1, SWITCH_FRAME_SIZE
+#else
+	ld	r4, _LINK(r1)
+	mtlr	r4
+
+	cmpwi	r3, RESUME_GUEST
+	bne	kvm_exit_heavyweight
+
+	REST_2GPRS(3, r1)
+
+	addi    r1, r1, SWITCH_FRAME_SIZE
+
+	b	kvm_start_entry
+
+kvm_exit_heavyweight:
+
+	addi    r1, r1, SWITCH_FRAME_SIZE
+#endif
+
+	blr
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 07/27] Add book3s_64 highmem asm code
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

This is the of entry / exit code. In order to switch between host and guest
context, we need to switch register state and call the exit code handler on
exit.

This assembly file does exactly that. To finally enter the guest it calls
into book3s_64_slb.S. On exit it gets jumped at from book3s_64_slb.S too.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - header rename fix
---
 arch/powerpc/include/asm/kvm_ppc.h      |    1 +
 arch/powerpc/kvm/book3s_64_interrupts.S |  392 +++++++++++++++++++++++++++++++
 2 files changed, 393 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_interrupts.S

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 2c6ee34..269ee46 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -39,6 +39,7 @@ enum emulation_result {
 extern int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
 extern char kvmppc_handlers_start[];
 extern unsigned long kvmppc_handler_len;
+extern void kvmppc_handler_highmem(void);
 
 extern void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu);
 extern int kvmppc_handle_load(struct kvm_run *run, struct kvm_vcpu *vcpu,
diff --git a/arch/powerpc/kvm/book3s_64_interrupts.S b/arch/powerpc/kvm/book3s_64_interrupts.S
new file mode 100644
index 0000000..7b55d80
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_interrupts.S
@@ -0,0 +1,392 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <asm/ppc_asm.h>
+#include <asm/kvm_asm.h>
+#include <asm/reg.h>
+#include <asm/page.h>
+#include <asm/asm-offsets.h>
+#include <asm/exception-64s.h>
+
+#define KVMPPC_HANDLE_EXIT .kvmppc_handle_exit
+#define ULONG_SIZE 8
+#define VCPU_GPR(n)     (VCPU_GPRS + (n * ULONG_SIZE))
+
+.macro mfpaca tmp_reg, src_reg, offset, vcpu_reg
+	ld	\tmp_reg, (PACA_EXMC+\offset)(r13)
+	std	\tmp_reg, VCPU_GPR(\src_reg)(\vcpu_reg)
+.endm
+
+.macro DISABLE_INTERRUPTS
+       mfmsr   r0
+       rldicl  r0,r0,48,1
+       rotldi  r0,r0,16
+       mtmsrd  r0,1
+.endm
+
+/*****************************************************************************
+ *                                                                           *
+ *     Guest entry / exit code that is in kernel module memory (highmem)     *
+ *                                                                           *
+ ****************************************************************************/
+
+/* Registers:
+ *  r3: kvm_run pointer
+ *  r4: vcpu pointer
+ */
+_GLOBAL(__kvmppc_vcpu_entry)
+
+kvm_start_entry:
+	/* Write correct stack frame */
+	mflr    r0
+	std     r0,16(r1)
+
+	/* Save host state to the stack */
+	stdu	r1, -SWITCH_FRAME_SIZE(r1)
+
+	/* Save r3 (kvm_run) and r4 (vcpu) */
+	SAVE_2GPRS(3, r1)
+
+	/* Save non-volatile registers (r14 - r31) */
+	SAVE_NVGPRS(r1)
+
+	/* Save LR */
+	mflr	r14
+	std	r14, _LINK(r1)
+
+/* XXX optimize non-volatile loading away */
+kvm_start_lightweight:
+
+	DISABLE_INTERRUPTS
+
+	/* Save R1/R2 in the PACA */
+	std	r1, PACAR1(r13)
+	std	r2, (PACA_EXMC+EX_SRR0)(r13)
+	ld	r3, VCPU_HIGHMEM_HANDLER(r4)
+	std	r3, PACASAVEDMSR(r13)
+
+	/* Load non-volatile guest state from the vcpu */
+	ld	r14, VCPU_GPR(r14)(r4)
+	ld	r15, VCPU_GPR(r15)(r4)
+	ld	r16, VCPU_GPR(r16)(r4)
+	ld	r17, VCPU_GPR(r17)(r4)
+	ld	r18, VCPU_GPR(r18)(r4)
+	ld	r19, VCPU_GPR(r19)(r4)
+	ld	r20, VCPU_GPR(r20)(r4)
+	ld	r21, VCPU_GPR(r21)(r4)
+	ld	r22, VCPU_GPR(r22)(r4)
+	ld	r23, VCPU_GPR(r23)(r4)
+	ld	r24, VCPU_GPR(r24)(r4)
+	ld	r25, VCPU_GPR(r25)(r4)
+	ld	r26, VCPU_GPR(r26)(r4)
+	ld	r27, VCPU_GPR(r27)(r4)
+	ld	r28, VCPU_GPR(r28)(r4)
+	ld	r29, VCPU_GPR(r29)(r4)
+	ld	r30, VCPU_GPR(r30)(r4)
+	ld	r31, VCPU_GPR(r31)(r4)
+
+	ld	r9, VCPU_PC(r4)			/* r9 = vcpu->arch.pc */
+	ld	r10, VCPU_SHADOW_MSR(r4)	/* r10 = vcpu->arch.shadow_msr */
+
+	ld	r3, VCPU_TRAMPOLINE_ENTER(r4)
+	mtsrr0	r3
+
+	LOAD_REG_IMMEDIATE(r3, MSR_KERNEL & ~(MSR_IR | MSR_DR))
+	mtsrr1	r3
+
+	/* Load guest state in the respective registers */
+	lwz	r3, VCPU_CR(r4)		/* r3 = vcpu->arch.cr */
+	stw	r3, (PACA_EXMC + EX_CCR)(r13)
+
+	ld	r3, VCPU_CTR(r4)	/* r3 = vcpu->arch.ctr */
+	mtctr	r3			/* CTR = r3 */
+
+	ld	r3, VCPU_LR(r4)		/* r3 = vcpu->arch.lr */
+	mtlr	r3			/* LR = r3 */
+
+	ld	r3, VCPU_XER(r4)	/* r3 = vcpu->arch.xer */
+	std	r3, (PACA_EXMC + EX_R3)(r13)
+
+	/* Some guests may need to have dcbz set to 32 byte length.
+	 *
+	 * Usually we ensure that by patching the guest's instructions
+	 * to trap on dcbz and emulate it in the hypervisor.
+	 *
+	 * If we can, we should tell the CPU to use 32 byte dcbz though,
+	 * because that's a lot faster.
+	 */
+
+	ld	r3, VCPU_HFLAGS(r4)
+	rldicl.	r3, r3, 0, 63		/* CR = ((r3 & 1) == 0) */
+	beq	no_dcbz32_on
+
+	mfspr   r3,SPRN_HID5
+	ori     r3, r3, 0x80		/* XXX HID5_dcbz32 = 0x80 */
+	mtspr   SPRN_HID5,r3
+
+no_dcbz32_on:
+	/*	Load guest GPRs */
+
+	ld	r3, VCPU_GPR(r9)(r4)
+	std	r3, (PACA_EXMC + EX_R9)(r13)
+	ld	r3, VCPU_GPR(r10)(r4)
+	std	r3, (PACA_EXMC + EX_R10)(r13)
+	ld	r3, VCPU_GPR(r11)(r4)
+	std	r3, (PACA_EXMC + EX_R11)(r13)
+	ld	r3, VCPU_GPR(r12)(r4)
+	std	r3, (PACA_EXMC + EX_R12)(r13)
+	ld	r3, VCPU_GPR(r13)(r4)
+	std	r3, (PACA_EXMC + EX_R13)(r13)
+
+	ld	r0, VCPU_GPR(r0)(r4)
+	ld	r1, VCPU_GPR(r1)(r4)
+	ld	r2, VCPU_GPR(r2)(r4)
+	ld	r3, VCPU_GPR(r3)(r4)
+	ld	r5, VCPU_GPR(r5)(r4)
+	ld	r6, VCPU_GPR(r6)(r4)
+	ld	r7, VCPU_GPR(r7)(r4)
+	ld	r8, VCPU_GPR(r8)(r4)
+	ld	r4, VCPU_GPR(r4)(r4)
+
+	/* This sets the Magic value for the trampoline */
+
+	li	r11, 1
+	stb	r11, PACA_KVM_IN_GUEST(r13)
+
+	/* Jump to SLB patching handlder and into our guest */
+	RFI
+
+/*
+ * This is the handler in module memory. It gets jumped at from the
+ * lowmem trampoline code, so it's basically the guest exit code.
+ *
+ */
+
+.global kvmppc_handler_highmem
+kvmppc_handler_highmem:
+
+	/*
+	 * Register usage at this point:
+	 *
+	 * R00   = guest R13
+	 * R01   = host R1
+	 * R02   = host R2
+	 * R10   = guest PC
+	 * R11   = guest MSR
+	 * R12   = exit handler id
+	 * R13   = PACA
+	 * PACA.exmc.R9    = guest R1
+	 * PACA.exmc.R10   = guest R10
+	 * PACA.exmc.R11   = guest R11
+	 * PACA.exmc.R12   = guest R12
+	 * PACA.exmc.R13   = guest R2
+	 * PACA.exmc.DAR   = guest DAR
+	 * PACA.exmc.DSISR = guest DSISR
+	 * PACA.exmc.LR    = guest instruction
+	 * PACA.exmc.CCR   = guest CR
+	 * PACA.exmc.SRR0  = guest R0
+	 *
+	 */
+
+	std	r3, (PACA_EXMC+EX_R3)(r13)
+
+	/* save the exit id in R3 */
+	mr	r3, r12
+
+	/* R12 = vcpu */
+	ld	r12, GPR4(r1)
+
+	/* Now save the guest state */
+
+	std	r0, VCPU_GPR(r13)(r12)
+	std	r4, VCPU_GPR(r4)(r12)
+	std	r5, VCPU_GPR(r5)(r12)
+	std	r6, VCPU_GPR(r6)(r12)
+	std	r7, VCPU_GPR(r7)(r12)
+	std	r8, VCPU_GPR(r8)(r12)
+	std	r9, VCPU_GPR(r9)(r12)
+
+	/* get registers from PACA */
+	mfpaca	r5, r0, EX_SRR0, r12
+	mfpaca	r5, r3, EX_R3, r12
+	mfpaca	r5, r1, EX_R9, r12
+	mfpaca	r5, r10, EX_R10, r12
+	mfpaca	r5, r11, EX_R11, r12
+	mfpaca	r5, r12, EX_R12, r12
+	mfpaca	r5, r2, EX_R13, r12
+
+	lwz	r5, (PACA_EXMC+EX_LR)(r13)
+	stw	r5, VCPU_LAST_INST(r12)
+
+	lwz	r5, (PACA_EXMC+EX_CCR)(r13)
+	stw	r5, VCPU_CR(r12)
+
+	ld	r5, VCPU_HFLAGS(r12)
+	rldicl.	r5, r5, 0, 63		/* CR = ((r5 & 1) == 0) */
+	beq	no_dcbz32_off
+
+	mfspr   r5,SPRN_HID5
+	rldimi  r5,r5,6,56
+	mtspr   SPRN_HID5,r5
+
+no_dcbz32_off:
+
+	/* XXX maybe skip on lightweight? */
+	std	r14, VCPU_GPR(r14)(r12)
+	std	r15, VCPU_GPR(r15)(r12)
+	std	r16, VCPU_GPR(r16)(r12)
+	std	r17, VCPU_GPR(r17)(r12)
+	std	r18, VCPU_GPR(r18)(r12)
+	std	r19, VCPU_GPR(r19)(r12)
+	std	r20, VCPU_GPR(r20)(r12)
+	std	r21, VCPU_GPR(r21)(r12)
+	std	r22, VCPU_GPR(r22)(r12)
+	std	r23, VCPU_GPR(r23)(r12)
+	std	r24, VCPU_GPR(r24)(r12)
+	std	r25, VCPU_GPR(r25)(r12)
+	std	r26, VCPU_GPR(r26)(r12)
+	std	r27, VCPU_GPR(r27)(r12)
+	std	r28, VCPU_GPR(r28)(r12)
+	std	r29, VCPU_GPR(r29)(r12)
+	std	r30, VCPU_GPR(r30)(r12)
+	std	r31, VCPU_GPR(r31)(r12)
+
+	/* Restore non-volatile host registers (r14 - r31) */
+	REST_NVGPRS(r1)
+
+	/* Save guest PC (R10) */
+	std	r10, VCPU_PC(r12)
+
+	/* Save guest msr (R11) */
+	std	r11, VCPU_SHADOW_MSR(r12)
+
+	/* Save guest CTR (in R12) */
+	mfctr	r5
+	std	r5, VCPU_CTR(r12)
+
+	/* Save guest LR */
+	mflr	r5
+	std	r5, VCPU_LR(r12)
+
+	/* Save guest XER */
+	mfxer	r5
+	std	r5, VCPU_XER(r12)
+
+	/* Save guest DAR */
+	ld	r5, (PACA_EXMC+EX_DAR)(r13)
+	std	r5, VCPU_FAULT_DEAR(r12)
+
+	/* Save guest DSISR */
+	lwz	r5, (PACA_EXMC+EX_DSISR)(r13)
+	std	r5, VCPU_FAULT_DSISR(r12)
+
+	/* Restore host msr -> SRR1 */
+	ld	r7, VCPU_HOST_MSR(r12)
+	mtsrr1	r7
+
+	/* Restore host IP -> SRR0 */
+	ld	r6, VCPU_HOST_RETIP(r12)
+	mtsrr0	r6
+
+	/*
+	 * For some interrupts, we need to call the real Linux
+	 * handler, so it can do work for us. This has to happen
+	 * as if the interrupt arrived from the kernel though,
+	 * so let's fake it here where most state is restored.
+	 *
+	 * Call Linux for hardware interrupts/decrementer
+	 * r3 = address of interrupt handler (exit reason)
+	 */
+
+	cmpwi	r3, BOOK3S_INTERRUPT_EXTERNAL
+	beq	call_linux_handler
+	cmpwi	r3, BOOK3S_INTERRUPT_DECREMENTER
+	beq	call_linux_handler
+
+	/* Back to Interruptable Mode! (goto kvm_return_point) */
+	RFI
+
+call_linux_handler:
+
+	/*
+	 * If we land here we need to jump back to the handler we
+	 * came from.
+	 *
+	 * We have a page that we can access from real mode, so let's
+	 * jump back to that and use it as a trampoline to get back into the
+	 * interrupt handler!
+	 *
+	 * R3 still contains the exit code,
+	 * R6 VCPU_HOST_RETIP and
+	 * R7 VCPU_HOST_MSR
+	 */
+
+	mtlr	r3
+
+	ld	r5, VCPU_TRAMPOLINE_LOWMEM(r12)
+	mtsrr0	r5
+	LOAD_REG_IMMEDIATE(r5, MSR_KERNEL & ~(MSR_IR | MSR_DR))
+	mtsrr1	r5
+
+	RFI
+
+.global kvm_return_point
+kvm_return_point:
+
+	/* Jump back to lightweight entry if we're supposed to */
+	/* go back into the guest */
+	mr	r5, r3
+	/* Restore r3 (kvm_run) and r4 (vcpu) */
+	REST_2GPRS(3, r1)
+	bl	KVMPPC_HANDLE_EXIT
+
+#if 0 /* XXX get lightweight exits back */
+	cmpwi	r3, RESUME_GUEST
+	bne	kvm_exit_heavyweight
+
+	/* put VCPU and KVM_RUN back into place and roll again! */
+	REST_2GPRS(3, r1)
+	b	kvm_start_lightweight
+
+kvm_exit_heavyweight:
+	/* Restore non-volatile host registers */
+	ld	r14, _LINK(r1)
+	mtlr	r14
+	REST_NVGPRS(r1)
+
+	addi    r1, r1, SWITCH_FRAME_SIZE
+#else
+	ld	r4, _LINK(r1)
+	mtlr	r4
+
+	cmpwi	r3, RESUME_GUEST
+	bne	kvm_exit_heavyweight
+
+	REST_2GPRS(3, r1)
+
+	addi    r1, r1, SWITCH_FRAME_SIZE
+
+	b	kvm_start_entry
+
+kvm_exit_heavyweight:
+
+	addi    r1, r1, SWITCH_FRAME_SIZE
+#endif
+
+	blr
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 07/27] Add book3s_64 highmem asm code
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

This is the of entry / exit code. In order to switch between host and guest
context, we need to switch register state and call the exit code handler on
exit.

This assembly file does exactly that. To finally enter the guest it calls
into book3s_64_slb.S. On exit it gets jumped at from book3s_64_slb.S too.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - header rename fix
---
 arch/powerpc/include/asm/kvm_ppc.h      |    1 +
 arch/powerpc/kvm/book3s_64_interrupts.S |  392 +++++++++++++++++++++++++++++++
 2 files changed, 393 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_interrupts.S

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 2c6ee34..269ee46 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -39,6 +39,7 @@ enum emulation_result {
 extern int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
 extern char kvmppc_handlers_start[];
 extern unsigned long kvmppc_handler_len;
+extern void kvmppc_handler_highmem(void);
 
 extern void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu);
 extern int kvmppc_handle_load(struct kvm_run *run, struct kvm_vcpu *vcpu,
diff --git a/arch/powerpc/kvm/book3s_64_interrupts.S b/arch/powerpc/kvm/book3s_64_interrupts.S
new file mode 100644
index 0000000..7b55d80
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_interrupts.S
@@ -0,0 +1,392 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <asm/ppc_asm.h>
+#include <asm/kvm_asm.h>
+#include <asm/reg.h>
+#include <asm/page.h>
+#include <asm/asm-offsets.h>
+#include <asm/exception-64s.h>
+
+#define KVMPPC_HANDLE_EXIT .kvmppc_handle_exit
+#define ULONG_SIZE 8
+#define VCPU_GPR(n)     (VCPU_GPRS + (n * ULONG_SIZE))
+
+.macro mfpaca tmp_reg, src_reg, offset, vcpu_reg
+	ld	\tmp_reg, (PACA_EXMC+\offset)(r13)
+	std	\tmp_reg, VCPU_GPR(\src_reg)(\vcpu_reg)
+.endm
+
+.macro DISABLE_INTERRUPTS
+       mfmsr   r0
+       rldicl  r0,r0,48,1
+       rotldi  r0,r0,16
+       mtmsrd  r0,1
+.endm
+
+/*****************************************************************************
+ *                                                                           *
+ *     Guest entry / exit code that is in kernel module memory (highmem)     *
+ *                                                                           *
+ ****************************************************************************/
+
+/* Registers:
+ *  r3: kvm_run pointer
+ *  r4: vcpu pointer
+ */
+_GLOBAL(__kvmppc_vcpu_entry)
+
+kvm_start_entry:
+	/* Write correct stack frame */
+	mflr    r0
+	std     r0,16(r1)
+
+	/* Save host state to the stack */
+	stdu	r1, -SWITCH_FRAME_SIZE(r1)
+
+	/* Save r3 (kvm_run) and r4 (vcpu) */
+	SAVE_2GPRS(3, r1)
+
+	/* Save non-volatile registers (r14 - r31) */
+	SAVE_NVGPRS(r1)
+
+	/* Save LR */
+	mflr	r14
+	std	r14, _LINK(r1)
+
+/* XXX optimize non-volatile loading away */
+kvm_start_lightweight:
+
+	DISABLE_INTERRUPTS
+
+	/* Save R1/R2 in the PACA */
+	std	r1, PACAR1(r13)
+	std	r2, (PACA_EXMC+EX_SRR0)(r13)
+	ld	r3, VCPU_HIGHMEM_HANDLER(r4)
+	std	r3, PACASAVEDMSR(r13)
+
+	/* Load non-volatile guest state from the vcpu */
+	ld	r14, VCPU_GPR(r14)(r4)
+	ld	r15, VCPU_GPR(r15)(r4)
+	ld	r16, VCPU_GPR(r16)(r4)
+	ld	r17, VCPU_GPR(r17)(r4)
+	ld	r18, VCPU_GPR(r18)(r4)
+	ld	r19, VCPU_GPR(r19)(r4)
+	ld	r20, VCPU_GPR(r20)(r4)
+	ld	r21, VCPU_GPR(r21)(r4)
+	ld	r22, VCPU_GPR(r22)(r4)
+	ld	r23, VCPU_GPR(r23)(r4)
+	ld	r24, VCPU_GPR(r24)(r4)
+	ld	r25, VCPU_GPR(r25)(r4)
+	ld	r26, VCPU_GPR(r26)(r4)
+	ld	r27, VCPU_GPR(r27)(r4)
+	ld	r28, VCPU_GPR(r28)(r4)
+	ld	r29, VCPU_GPR(r29)(r4)
+	ld	r30, VCPU_GPR(r30)(r4)
+	ld	r31, VCPU_GPR(r31)(r4)
+
+	ld	r9, VCPU_PC(r4)			/* r9 = vcpu->arch.pc */
+	ld	r10, VCPU_SHADOW_MSR(r4)	/* r10 = vcpu->arch.shadow_msr */
+
+	ld	r3, VCPU_TRAMPOLINE_ENTER(r4)
+	mtsrr0	r3
+
+	LOAD_REG_IMMEDIATE(r3, MSR_KERNEL & ~(MSR_IR | MSR_DR))
+	mtsrr1	r3
+
+	/* Load guest state in the respective registers */
+	lwz	r3, VCPU_CR(r4)		/* r3 = vcpu->arch.cr */
+	stw	r3, (PACA_EXMC + EX_CCR)(r13)
+
+	ld	r3, VCPU_CTR(r4)	/* r3 = vcpu->arch.ctr */
+	mtctr	r3			/* CTR = r3 */
+
+	ld	r3, VCPU_LR(r4)		/* r3 = vcpu->arch.lr */
+	mtlr	r3			/* LR = r3 */
+
+	ld	r3, VCPU_XER(r4)	/* r3 = vcpu->arch.xer */
+	std	r3, (PACA_EXMC + EX_R3)(r13)
+
+	/* Some guests may need to have dcbz set to 32 byte length.
+	 *
+	 * Usually we ensure that by patching the guest's instructions
+	 * to trap on dcbz and emulate it in the hypervisor.
+	 *
+	 * If we can, we should tell the CPU to use 32 byte dcbz though,
+	 * because that's a lot faster.
+	 */
+
+	ld	r3, VCPU_HFLAGS(r4)
+	rldicl.	r3, r3, 0, 63		/* CR = ((r3 & 1) = 0) */
+	beq	no_dcbz32_on
+
+	mfspr   r3,SPRN_HID5
+	ori     r3, r3, 0x80		/* XXX HID5_dcbz32 = 0x80 */
+	mtspr   SPRN_HID5,r3
+
+no_dcbz32_on:
+	/*	Load guest GPRs */
+
+	ld	r3, VCPU_GPR(r9)(r4)
+	std	r3, (PACA_EXMC + EX_R9)(r13)
+	ld	r3, VCPU_GPR(r10)(r4)
+	std	r3, (PACA_EXMC + EX_R10)(r13)
+	ld	r3, VCPU_GPR(r11)(r4)
+	std	r3, (PACA_EXMC + EX_R11)(r13)
+	ld	r3, VCPU_GPR(r12)(r4)
+	std	r3, (PACA_EXMC + EX_R12)(r13)
+	ld	r3, VCPU_GPR(r13)(r4)
+	std	r3, (PACA_EXMC + EX_R13)(r13)
+
+	ld	r0, VCPU_GPR(r0)(r4)
+	ld	r1, VCPU_GPR(r1)(r4)
+	ld	r2, VCPU_GPR(r2)(r4)
+	ld	r3, VCPU_GPR(r3)(r4)
+	ld	r5, VCPU_GPR(r5)(r4)
+	ld	r6, VCPU_GPR(r6)(r4)
+	ld	r7, VCPU_GPR(r7)(r4)
+	ld	r8, VCPU_GPR(r8)(r4)
+	ld	r4, VCPU_GPR(r4)(r4)
+
+	/* This sets the Magic value for the trampoline */
+
+	li	r11, 1
+	stb	r11, PACA_KVM_IN_GUEST(r13)
+
+	/* Jump to SLB patching handlder and into our guest */
+	RFI
+
+/*
+ * This is the handler in module memory. It gets jumped at from the
+ * lowmem trampoline code, so it's basically the guest exit code.
+ *
+ */
+
+.global kvmppc_handler_highmem
+kvmppc_handler_highmem:
+
+	/*
+	 * Register usage at this point:
+	 *
+	 * R00   = guest R13
+	 * R01   = host R1
+	 * R02   = host R2
+	 * R10   = guest PC
+	 * R11   = guest MSR
+	 * R12   = exit handler id
+	 * R13   = PACA
+	 * PACA.exmc.R9    = guest R1
+	 * PACA.exmc.R10   = guest R10
+	 * PACA.exmc.R11   = guest R11
+	 * PACA.exmc.R12   = guest R12
+	 * PACA.exmc.R13   = guest R2
+	 * PACA.exmc.DAR   = guest DAR
+	 * PACA.exmc.DSISR = guest DSISR
+	 * PACA.exmc.LR    = guest instruction
+	 * PACA.exmc.CCR   = guest CR
+	 * PACA.exmc.SRR0  = guest R0
+	 *
+	 */
+
+	std	r3, (PACA_EXMC+EX_R3)(r13)
+
+	/* save the exit id in R3 */
+	mr	r3, r12
+
+	/* R12 = vcpu */
+	ld	r12, GPR4(r1)
+
+	/* Now save the guest state */
+
+	std	r0, VCPU_GPR(r13)(r12)
+	std	r4, VCPU_GPR(r4)(r12)
+	std	r5, VCPU_GPR(r5)(r12)
+	std	r6, VCPU_GPR(r6)(r12)
+	std	r7, VCPU_GPR(r7)(r12)
+	std	r8, VCPU_GPR(r8)(r12)
+	std	r9, VCPU_GPR(r9)(r12)
+
+	/* get registers from PACA */
+	mfpaca	r5, r0, EX_SRR0, r12
+	mfpaca	r5, r3, EX_R3, r12
+	mfpaca	r5, r1, EX_R9, r12
+	mfpaca	r5, r10, EX_R10, r12
+	mfpaca	r5, r11, EX_R11, r12
+	mfpaca	r5, r12, EX_R12, r12
+	mfpaca	r5, r2, EX_R13, r12
+
+	lwz	r5, (PACA_EXMC+EX_LR)(r13)
+	stw	r5, VCPU_LAST_INST(r12)
+
+	lwz	r5, (PACA_EXMC+EX_CCR)(r13)
+	stw	r5, VCPU_CR(r12)
+
+	ld	r5, VCPU_HFLAGS(r12)
+	rldicl.	r5, r5, 0, 63		/* CR = ((r5 & 1) = 0) */
+	beq	no_dcbz32_off
+
+	mfspr   r5,SPRN_HID5
+	rldimi  r5,r5,6,56
+	mtspr   SPRN_HID5,r5
+
+no_dcbz32_off:
+
+	/* XXX maybe skip on lightweight? */
+	std	r14, VCPU_GPR(r14)(r12)
+	std	r15, VCPU_GPR(r15)(r12)
+	std	r16, VCPU_GPR(r16)(r12)
+	std	r17, VCPU_GPR(r17)(r12)
+	std	r18, VCPU_GPR(r18)(r12)
+	std	r19, VCPU_GPR(r19)(r12)
+	std	r20, VCPU_GPR(r20)(r12)
+	std	r21, VCPU_GPR(r21)(r12)
+	std	r22, VCPU_GPR(r22)(r12)
+	std	r23, VCPU_GPR(r23)(r12)
+	std	r24, VCPU_GPR(r24)(r12)
+	std	r25, VCPU_GPR(r25)(r12)
+	std	r26, VCPU_GPR(r26)(r12)
+	std	r27, VCPU_GPR(r27)(r12)
+	std	r28, VCPU_GPR(r28)(r12)
+	std	r29, VCPU_GPR(r29)(r12)
+	std	r30, VCPU_GPR(r30)(r12)
+	std	r31, VCPU_GPR(r31)(r12)
+
+	/* Restore non-volatile host registers (r14 - r31) */
+	REST_NVGPRS(r1)
+
+	/* Save guest PC (R10) */
+	std	r10, VCPU_PC(r12)
+
+	/* Save guest msr (R11) */
+	std	r11, VCPU_SHADOW_MSR(r12)
+
+	/* Save guest CTR (in R12) */
+	mfctr	r5
+	std	r5, VCPU_CTR(r12)
+
+	/* Save guest LR */
+	mflr	r5
+	std	r5, VCPU_LR(r12)
+
+	/* Save guest XER */
+	mfxer	r5
+	std	r5, VCPU_XER(r12)
+
+	/* Save guest DAR */
+	ld	r5, (PACA_EXMC+EX_DAR)(r13)
+	std	r5, VCPU_FAULT_DEAR(r12)
+
+	/* Save guest DSISR */
+	lwz	r5, (PACA_EXMC+EX_DSISR)(r13)
+	std	r5, VCPU_FAULT_DSISR(r12)
+
+	/* Restore host msr -> SRR1 */
+	ld	r7, VCPU_HOST_MSR(r12)
+	mtsrr1	r7
+
+	/* Restore host IP -> SRR0 */
+	ld	r6, VCPU_HOST_RETIP(r12)
+	mtsrr0	r6
+
+	/*
+	 * For some interrupts, we need to call the real Linux
+	 * handler, so it can do work for us. This has to happen
+	 * as if the interrupt arrived from the kernel though,
+	 * so let's fake it here where most state is restored.
+	 *
+	 * Call Linux for hardware interrupts/decrementer
+	 * r3 = address of interrupt handler (exit reason)
+	 */
+
+	cmpwi	r3, BOOK3S_INTERRUPT_EXTERNAL
+	beq	call_linux_handler
+	cmpwi	r3, BOOK3S_INTERRUPT_DECREMENTER
+	beq	call_linux_handler
+
+	/* Back to Interruptable Mode! (goto kvm_return_point) */
+	RFI
+
+call_linux_handler:
+
+	/*
+	 * If we land here we need to jump back to the handler we
+	 * came from.
+	 *
+	 * We have a page that we can access from real mode, so let's
+	 * jump back to that and use it as a trampoline to get back into the
+	 * interrupt handler!
+	 *
+	 * R3 still contains the exit code,
+	 * R6 VCPU_HOST_RETIP and
+	 * R7 VCPU_HOST_MSR
+	 */
+
+	mtlr	r3
+
+	ld	r5, VCPU_TRAMPOLINE_LOWMEM(r12)
+	mtsrr0	r5
+	LOAD_REG_IMMEDIATE(r5, MSR_KERNEL & ~(MSR_IR | MSR_DR))
+	mtsrr1	r5
+
+	RFI
+
+.global kvm_return_point
+kvm_return_point:
+
+	/* Jump back to lightweight entry if we're supposed to */
+	/* go back into the guest */
+	mr	r5, r3
+	/* Restore r3 (kvm_run) and r4 (vcpu) */
+	REST_2GPRS(3, r1)
+	bl	KVMPPC_HANDLE_EXIT
+
+#if 0 /* XXX get lightweight exits back */
+	cmpwi	r3, RESUME_GUEST
+	bne	kvm_exit_heavyweight
+
+	/* put VCPU and KVM_RUN back into place and roll again! */
+	REST_2GPRS(3, r1)
+	b	kvm_start_lightweight
+
+kvm_exit_heavyweight:
+	/* Restore non-volatile host registers */
+	ld	r14, _LINK(r1)
+	mtlr	r14
+	REST_NVGPRS(r1)
+
+	addi    r1, r1, SWITCH_FRAME_SIZE
+#else
+	ld	r4, _LINK(r1)
+	mtlr	r4
+
+	cmpwi	r3, RESUME_GUEST
+	bne	kvm_exit_heavyweight
+
+	REST_2GPRS(3, r1)
+
+	addi    r1, r1, SWITCH_FRAME_SIZE
+
+	b	kvm_start_entry
+
+kvm_exit_heavyweight:
+
+	addi    r1, r1, SWITCH_FRAME_SIZE
+#endif
+
+	blr
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 08/27] Add SLB switching code for entry/exit
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47     ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

This is the really low level of guest entry/exit code.

Book3s_64 has an SLB, which stores all ESID -> VSID mappings we're
currently aware of.

The segments in the guest differ from the ones on the host, so we need
to switch the SLB to tell the MMU that we're in a new context.

So we store a shadow of the guest's SLB in the PACA, switch to that on
entry and only restore bolted entries on exit, leaving the rest to the
Linux SLB fault handler.

That way we get a really clean way of switching the SLB.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
---
 arch/powerpc/kvm/book3s_64_slb.S |  277 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 277 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_slb.S

diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/book3s_64_slb.S
new file mode 100644
index 0000000..00a8367
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_slb.S
@@ -0,0 +1,277 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
+ */
+
+/******************************************************************************
+ *                                                                            *
+ *                               Entry code                                   *
+ *                                                                            *
+ *****************************************************************************/
+
+.global kvmppc_handler_trampoline_enter
+kvmppc_handler_trampoline_enter:
+
+	/* Required state:
+	 *
+	 * MSR = ~IR|DR
+	 * R13 = PACA
+	 * R9 = guest IP
+	 * R10 = guest MSR
+	 * R11 = free
+	 * R12 = free
+	 * PACA[PACA_EXMC + EX_R9] = guest R9
+	 * PACA[PACA_EXMC + EX_R10] = guest R10
+	 * PACA[PACA_EXMC + EX_R11] = guest R11
+	 * PACA[PACA_EXMC + EX_R12] = guest R12
+	 * PACA[PACA_EXMC + EX_R13] = guest R13
+	 * PACA[PACA_EXMC + EX_CCR] = guest CR
+	 * PACA[PACA_EXMC + EX_R3] = guest XER
+	 */
+
+	mtsrr0	r9
+	mtsrr1	r10
+
+	mtspr	SPRN_SPRG_SCRATCH0, r0
+
+	/* Remove LPAR shadow entries */
+
+#if SLB_NUM_BOLTED == 3
+
+	ld	r12, PACA_SLBSHADOWPTR(r13)
+	ld	r10, 0x10(r12)
+	ld	r11, 0x18(r12)
+	/* Invalid? Skip. */
+	rldicl. r0, r10, 37, 63
+	beq	slb_entry_skip_1
+	xoris	r9, r10, SLB_ESID_V@h
+	std	r9, 0x10(r12)
+slb_entry_skip_1:
+	ld	r9, 0x20(r12)
+	/* Invalid? Skip. */
+	rldicl. r0, r9, 37, 63
+	beq	slb_entry_skip_2
+	xoris	r9, r9, SLB_ESID_V@h
+	std	r9, 0x20(r12)
+slb_entry_skip_2:
+	ld	r9, 0x30(r12)
+	/* Invalid? Skip. */
+	rldicl. r0, r9, 37, 63
+	beq	slb_entry_skip_3
+	xoris	r9, r9, SLB_ESID_V@h
+	std	r9, 0x30(r12)
+slb_entry_skip_3:
+	
+#else
+#error unknown number of bolted entries
+#endif
+
+	/* Flush SLB */
+
+	slbia
+
+	/* r0 = esid & ESID_MASK */
+	rldicr  r10, r10, 0, 35
+	/* r0 |= CLASS_BIT(VSID) */
+	rldic   r12, r11, 56 - 36, 36
+	or      r10, r10, r12
+	slbie	r10
+
+	isync
+
+	/* Fill SLB with our shadow */
+
+	lbz	r12, PACA_KVM_SLB_MAX(r13)
+	mulli	r12, r12, 16
+	addi	r12, r12, PACA_KVM_SLB
+	add	r12, r12, r13
+
+	/* for (r11 = kvm_slb; r11 < kvm_slb + kvm_slb_size; r11+=slb_entry) */
+	li	r11, PACA_KVM_SLB
+	add	r11, r11, r13
+
+slb_loop_enter:
+
+	ld	r10, 0(r11)
+
+	rldicl. r0, r10, 37, 63
+	beq	slb_loop_enter_skip
+
+	ld	r9, 8(r11)
+	slbmte	r9, r10
+
+slb_loop_enter_skip:
+	addi	r11, r11, 16
+	cmpd	cr0, r11, r12
+	blt	slb_loop_enter
+
+slb_do_enter:
+
+	/* Enter guest */
+
+	mfspr	r0, SPRN_SPRG_SCRATCH0
+
+	ld	r9, (PACA_EXMC+EX_R9)(r13)
+	ld	r10, (PACA_EXMC+EX_R10)(r13)
+	ld	r12, (PACA_EXMC+EX_R12)(r13)
+
+	lwz	r11, (PACA_EXMC+EX_CCR)(r13)
+	mtcr	r11
+
+	ld	r11, (PACA_EXMC+EX_R3)(r13)
+	mtxer	r11
+
+	ld	r11, (PACA_EXMC+EX_R11)(r13)
+	ld	r13, (PACA_EXMC+EX_R13)(r13)
+
+	RFI
+kvmppc_handler_trampoline_enter_end:
+
+
+
+/******************************************************************************
+ *                                                                            *
+ *                               Exit code                                    *
+ *                                                                            *
+ *****************************************************************************/
+
+.global kvmppc_handler_trampoline_exit
+kvmppc_handler_trampoline_exit:
+
+	/* Register usage at this point:
+	 *
+	 * SPRG_SCRATCH0 = guest R13
+	 * R01           = host R1
+	 * R02           = host R2
+	 * R10           = guest PC
+	 * R11           = guest MSR
+	 * R12           = exit handler id
+	 * R13           = PACA
+	 * PACA.exmc.CCR  = guest CR
+	 * PACA.exmc.R9  = guest R1
+	 * PACA.exmc.R10 = guest R10
+	 * PACA.exmc.R11 = guest R11
+	 * PACA.exmc.R12 = guest R12
+	 * PACA.exmc.R13 = guest R2
+	 *
+	 */
+
+	/* Save registers */
+
+	std	r0, (PACA_EXMC+EX_SRR0)(r13)
+	std	r9, (PACA_EXMC+EX_R3)(r13)
+	std	r10, (PACA_EXMC+EX_LR)(r13)
+	std	r11, (PACA_EXMC+EX_DAR)(r13)
+
+	/*
+	 * In order for us to easily get the last instruction,
+	 * we got the #vmexit at, we exploit the fact that the
+	 * virtual layout is still the same here, so we can just
+	 * ld from the guest's PC address
+	 */
+
+	/* We only load the last instruction when it's safe */
+	cmpwi	r12, BOOK3S_INTERRUPT_DATA_STORAGE
+	beq	ld_last_inst
+	cmpwi	r12, BOOK3S_INTERRUPT_PROGRAM
+	beq	ld_last_inst
+
+	b	no_ld_last_inst
+
+ld_last_inst:
+	/* Save off the guest instruction we're at */
+	/*    1) enable paging for data */
+	mfmsr	r9
+	ori	r11, r9, MSR_DR			/* Enable paging for data */
+	mtmsr	r11
+	/*    2) fetch the instruction */
+	lwz	r0, 0(r10)
+	/*    3) disable paging again */
+	mtmsr	r9
+
+no_ld_last_inst:
+
+	/* Restore bolted entries from the shadow and fix it along the way */
+
+	/* We don't store anything in entry 0, so we don't need to take care of that */
+	slbia
+	isync
+
+#if SLB_NUM_BOLTED == 3
+
+	ld	r11, PACA_SLBSHADOWPTR(r13)
+
+	ld	r10, 0x10(r11)
+	cmpdi	r10, 0
+	beq	slb_exit_skip_1
+	oris	r10, r10, SLB_ESID_V@h
+	ld	r9, 0x18(r11)
+	slbmte	r9, r10
+	std	r10, 0x10(r11)
+slb_exit_skip_1:
+	
+	ld	r10, 0x20(r11)
+	cmpdi	r10, 0
+	beq	slb_exit_skip_2
+	oris	r10, r10, SLB_ESID_V@h
+	ld	r9, 0x28(r11)
+	slbmte	r9, r10
+	std	r10, 0x20(r11)
+slb_exit_skip_2:
+	
+	ld	r10, 0x30(r11)
+	cmpdi	r10, 0
+	beq	slb_exit_skip_3
+	oris	r10, r10, SLB_ESID_V@h
+	ld	r9, 0x38(r11)
+	slbmte	r9, r10
+	std	r10, 0x30(r11)
+slb_exit_skip_3:
+	
+#else
+#error unknown number of bolted entries
+#endif
+
+slb_do_exit:
+
+	/* Restore registers */
+
+	ld	r11, (PACA_EXMC+EX_DAR)(r13)
+	ld	r10, (PACA_EXMC+EX_LR)(r13)
+	ld	r9, (PACA_EXMC+EX_R3)(r13)
+
+	/* Save last inst */
+	stw	r0, (PACA_EXMC+EX_LR)(r13)
+
+	/* Save DAR and DSISR before going to paged mode */
+	mfdar	r0
+	std	r0, (PACA_EXMC+EX_DAR)(r13)
+	mfdsisr	r0
+	stw	r0, (PACA_EXMC+EX_DSISR)(r13)
+
+	/* RFI into the highmem handler */
+	mfmsr	r0
+	ori	r0, r0, MSR_IR|MSR_DR|MSR_RI	/* Enable paging */
+	mtsrr1	r0
+	ld	r0, PACASAVEDMSR(r13)		/* Highmem handler address */
+	mtsrr0	r0
+
+	mfspr	r0, SPRN_SPRG_SCRATCH0
+
+	RFI
+kvmppc_handler_trampoline_exit_end:
+
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 08/27] Add SLB switching code for entry/exit
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

This is the really low level of guest entry/exit code.

Book3s_64 has an SLB, which stores all ESID -> VSID mappings we're
currently aware of.

The segments in the guest differ from the ones on the host, so we need
to switch the SLB to tell the MMU that we're in a new context.

So we store a shadow of the guest's SLB in the PACA, switch to that on
entry and only restore bolted entries on exit, leaving the rest to the
Linux SLB fault handler.

That way we get a really clean way of switching the SLB.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s_64_slb.S |  277 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 277 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_slb.S

diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/book3s_64_slb.S
new file mode 100644
index 0000000..00a8367
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_slb.S
@@ -0,0 +1,277 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+/******************************************************************************
+ *                                                                            *
+ *                               Entry code                                   *
+ *                                                                            *
+ *****************************************************************************/
+
+.global kvmppc_handler_trampoline_enter
+kvmppc_handler_trampoline_enter:
+
+	/* Required state:
+	 *
+	 * MSR = ~IR|DR
+	 * R13 = PACA
+	 * R9 = guest IP
+	 * R10 = guest MSR
+	 * R11 = free
+	 * R12 = free
+	 * PACA[PACA_EXMC + EX_R9] = guest R9
+	 * PACA[PACA_EXMC + EX_R10] = guest R10
+	 * PACA[PACA_EXMC + EX_R11] = guest R11
+	 * PACA[PACA_EXMC + EX_R12] = guest R12
+	 * PACA[PACA_EXMC + EX_R13] = guest R13
+	 * PACA[PACA_EXMC + EX_CCR] = guest CR
+	 * PACA[PACA_EXMC + EX_R3] = guest XER
+	 */
+
+	mtsrr0	r9
+	mtsrr1	r10
+
+	mtspr	SPRN_SPRG_SCRATCH0, r0
+
+	/* Remove LPAR shadow entries */
+
+#if SLB_NUM_BOLTED == 3
+
+	ld	r12, PACA_SLBSHADOWPTR(r13)
+	ld	r10, 0x10(r12)
+	ld	r11, 0x18(r12)
+	/* Invalid? Skip. */
+	rldicl. r0, r10, 37, 63
+	beq	slb_entry_skip_1
+	xoris	r9, r10, SLB_ESID_V@h
+	std	r9, 0x10(r12)
+slb_entry_skip_1:
+	ld	r9, 0x20(r12)
+	/* Invalid? Skip. */
+	rldicl. r0, r9, 37, 63
+	beq	slb_entry_skip_2
+	xoris	r9, r9, SLB_ESID_V@h
+	std	r9, 0x20(r12)
+slb_entry_skip_2:
+	ld	r9, 0x30(r12)
+	/* Invalid? Skip. */
+	rldicl. r0, r9, 37, 63
+	beq	slb_entry_skip_3
+	xoris	r9, r9, SLB_ESID_V@h
+	std	r9, 0x30(r12)
+slb_entry_skip_3:
+	
+#else
+#error unknown number of bolted entries
+#endif
+
+	/* Flush SLB */
+
+	slbia
+
+	/* r0 = esid & ESID_MASK */
+	rldicr  r10, r10, 0, 35
+	/* r0 |= CLASS_BIT(VSID) */
+	rldic   r12, r11, 56 - 36, 36
+	or      r10, r10, r12
+	slbie	r10
+
+	isync
+
+	/* Fill SLB with our shadow */
+
+	lbz	r12, PACA_KVM_SLB_MAX(r13)
+	mulli	r12, r12, 16
+	addi	r12, r12, PACA_KVM_SLB
+	add	r12, r12, r13
+
+	/* for (r11 = kvm_slb; r11 < kvm_slb + kvm_slb_size; r11+=slb_entry) */
+	li	r11, PACA_KVM_SLB
+	add	r11, r11, r13
+
+slb_loop_enter:
+
+	ld	r10, 0(r11)
+
+	rldicl. r0, r10, 37, 63
+	beq	slb_loop_enter_skip
+
+	ld	r9, 8(r11)
+	slbmte	r9, r10
+
+slb_loop_enter_skip:
+	addi	r11, r11, 16
+	cmpd	cr0, r11, r12
+	blt	slb_loop_enter
+
+slb_do_enter:
+
+	/* Enter guest */
+
+	mfspr	r0, SPRN_SPRG_SCRATCH0
+
+	ld	r9, (PACA_EXMC+EX_R9)(r13)
+	ld	r10, (PACA_EXMC+EX_R10)(r13)
+	ld	r12, (PACA_EXMC+EX_R12)(r13)
+
+	lwz	r11, (PACA_EXMC+EX_CCR)(r13)
+	mtcr	r11
+
+	ld	r11, (PACA_EXMC+EX_R3)(r13)
+	mtxer	r11
+
+	ld	r11, (PACA_EXMC+EX_R11)(r13)
+	ld	r13, (PACA_EXMC+EX_R13)(r13)
+
+	RFI
+kvmppc_handler_trampoline_enter_end:
+
+
+
+/******************************************************************************
+ *                                                                            *
+ *                               Exit code                                    *
+ *                                                                            *
+ *****************************************************************************/
+
+.global kvmppc_handler_trampoline_exit
+kvmppc_handler_trampoline_exit:
+
+	/* Register usage at this point:
+	 *
+	 * SPRG_SCRATCH0 = guest R13
+	 * R01           = host R1
+	 * R02           = host R2
+	 * R10           = guest PC
+	 * R11           = guest MSR
+	 * R12           = exit handler id
+	 * R13           = PACA
+	 * PACA.exmc.CCR  = guest CR
+	 * PACA.exmc.R9  = guest R1
+	 * PACA.exmc.R10 = guest R10
+	 * PACA.exmc.R11 = guest R11
+	 * PACA.exmc.R12 = guest R12
+	 * PACA.exmc.R13 = guest R2
+	 *
+	 */
+
+	/* Save registers */
+
+	std	r0, (PACA_EXMC+EX_SRR0)(r13)
+	std	r9, (PACA_EXMC+EX_R3)(r13)
+	std	r10, (PACA_EXMC+EX_LR)(r13)
+	std	r11, (PACA_EXMC+EX_DAR)(r13)
+
+	/*
+	 * In order for us to easily get the last instruction,
+	 * we got the #vmexit at, we exploit the fact that the
+	 * virtual layout is still the same here, so we can just
+	 * ld from the guest's PC address
+	 */
+
+	/* We only load the last instruction when it's safe */
+	cmpwi	r12, BOOK3S_INTERRUPT_DATA_STORAGE
+	beq	ld_last_inst
+	cmpwi	r12, BOOK3S_INTERRUPT_PROGRAM
+	beq	ld_last_inst
+
+	b	no_ld_last_inst
+
+ld_last_inst:
+	/* Save off the guest instruction we're at */
+	/*    1) enable paging for data */
+	mfmsr	r9
+	ori	r11, r9, MSR_DR			/* Enable paging for data */
+	mtmsr	r11
+	/*    2) fetch the instruction */
+	lwz	r0, 0(r10)
+	/*    3) disable paging again */
+	mtmsr	r9
+
+no_ld_last_inst:
+
+	/* Restore bolted entries from the shadow and fix it along the way */
+
+	/* We don't store anything in entry 0, so we don't need to take care of that */
+	slbia
+	isync
+
+#if SLB_NUM_BOLTED == 3
+
+	ld	r11, PACA_SLBSHADOWPTR(r13)
+
+	ld	r10, 0x10(r11)
+	cmpdi	r10, 0
+	beq	slb_exit_skip_1
+	oris	r10, r10, SLB_ESID_V@h
+	ld	r9, 0x18(r11)
+	slbmte	r9, r10
+	std	r10, 0x10(r11)
+slb_exit_skip_1:
+	
+	ld	r10, 0x20(r11)
+	cmpdi	r10, 0
+	beq	slb_exit_skip_2
+	oris	r10, r10, SLB_ESID_V@h
+	ld	r9, 0x28(r11)
+	slbmte	r9, r10
+	std	r10, 0x20(r11)
+slb_exit_skip_2:
+	
+	ld	r10, 0x30(r11)
+	cmpdi	r10, 0
+	beq	slb_exit_skip_3
+	oris	r10, r10, SLB_ESID_V@h
+	ld	r9, 0x38(r11)
+	slbmte	r9, r10
+	std	r10, 0x30(r11)
+slb_exit_skip_3:
+	
+#else
+#error unknown number of bolted entries
+#endif
+
+slb_do_exit:
+
+	/* Restore registers */
+
+	ld	r11, (PACA_EXMC+EX_DAR)(r13)
+	ld	r10, (PACA_EXMC+EX_LR)(r13)
+	ld	r9, (PACA_EXMC+EX_R3)(r13)
+
+	/* Save last inst */
+	stw	r0, (PACA_EXMC+EX_LR)(r13)
+
+	/* Save DAR and DSISR before going to paged mode */
+	mfdar	r0
+	std	r0, (PACA_EXMC+EX_DAR)(r13)
+	mfdsisr	r0
+	stw	r0, (PACA_EXMC+EX_DSISR)(r13)
+
+	/* RFI into the highmem handler */
+	mfmsr	r0
+	ori	r0, r0, MSR_IR|MSR_DR|MSR_RI	/* Enable paging */
+	mtsrr1	r0
+	ld	r0, PACASAVEDMSR(r13)		/* Highmem handler address */
+	mtsrr0	r0
+
+	mfspr	r0, SPRN_SPRG_SCRATCH0
+
+	RFI
+kvmppc_handler_trampoline_exit_end:
+
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 08/27] Add SLB switching code for entry/exit
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

This is the really low level of guest entry/exit code.

Book3s_64 has an SLB, which stores all ESID -> VSID mappings we're
currently aware of.

The segments in the guest differ from the ones on the host, so we need
to switch the SLB to tell the MMU that we're in a new context.

So we store a shadow of the guest's SLB in the PACA, switch to that on
entry and only restore bolted entries on exit, leaving the rest to the
Linux SLB fault handler.

That way we get a really clean way of switching the SLB.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s_64_slb.S |  277 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 277 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_slb.S

diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/book3s_64_slb.S
new file mode 100644
index 0000000..00a8367
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_slb.S
@@ -0,0 +1,277 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+/******************************************************************************
+ *                                                                            *
+ *                               Entry code                                   *
+ *                                                                            *
+ *****************************************************************************/
+
+.global kvmppc_handler_trampoline_enter
+kvmppc_handler_trampoline_enter:
+
+	/* Required state:
+	 *
+	 * MSR = ~IR|DR
+	 * R13 = PACA
+	 * R9 = guest IP
+	 * R10 = guest MSR
+	 * R11 = free
+	 * R12 = free
+	 * PACA[PACA_EXMC + EX_R9] = guest R9
+	 * PACA[PACA_EXMC + EX_R10] = guest R10
+	 * PACA[PACA_EXMC + EX_R11] = guest R11
+	 * PACA[PACA_EXMC + EX_R12] = guest R12
+	 * PACA[PACA_EXMC + EX_R13] = guest R13
+	 * PACA[PACA_EXMC + EX_CCR] = guest CR
+	 * PACA[PACA_EXMC + EX_R3] = guest XER
+	 */
+
+	mtsrr0	r9
+	mtsrr1	r10
+
+	mtspr	SPRN_SPRG_SCRATCH0, r0
+
+	/* Remove LPAR shadow entries */
+
+#if SLB_NUM_BOLTED = 3
+
+	ld	r12, PACA_SLBSHADOWPTR(r13)
+	ld	r10, 0x10(r12)
+	ld	r11, 0x18(r12)
+	/* Invalid? Skip. */
+	rldicl. r0, r10, 37, 63
+	beq	slb_entry_skip_1
+	xoris	r9, r10, SLB_ESID_V@h
+	std	r9, 0x10(r12)
+slb_entry_skip_1:
+	ld	r9, 0x20(r12)
+	/* Invalid? Skip. */
+	rldicl. r0, r9, 37, 63
+	beq	slb_entry_skip_2
+	xoris	r9, r9, SLB_ESID_V@h
+	std	r9, 0x20(r12)
+slb_entry_skip_2:
+	ld	r9, 0x30(r12)
+	/* Invalid? Skip. */
+	rldicl. r0, r9, 37, 63
+	beq	slb_entry_skip_3
+	xoris	r9, r9, SLB_ESID_V@h
+	std	r9, 0x30(r12)
+slb_entry_skip_3:
+	
+#else
+#error unknown number of bolted entries
+#endif
+
+	/* Flush SLB */
+
+	slbia
+
+	/* r0 = esid & ESID_MASK */
+	rldicr  r10, r10, 0, 35
+	/* r0 |= CLASS_BIT(VSID) */
+	rldic   r12, r11, 56 - 36, 36
+	or      r10, r10, r12
+	slbie	r10
+
+	isync
+
+	/* Fill SLB with our shadow */
+
+	lbz	r12, PACA_KVM_SLB_MAX(r13)
+	mulli	r12, r12, 16
+	addi	r12, r12, PACA_KVM_SLB
+	add	r12, r12, r13
+
+	/* for (r11 = kvm_slb; r11 < kvm_slb + kvm_slb_size; r11+=slb_entry) */
+	li	r11, PACA_KVM_SLB
+	add	r11, r11, r13
+
+slb_loop_enter:
+
+	ld	r10, 0(r11)
+
+	rldicl. r0, r10, 37, 63
+	beq	slb_loop_enter_skip
+
+	ld	r9, 8(r11)
+	slbmte	r9, r10
+
+slb_loop_enter_skip:
+	addi	r11, r11, 16
+	cmpd	cr0, r11, r12
+	blt	slb_loop_enter
+
+slb_do_enter:
+
+	/* Enter guest */
+
+	mfspr	r0, SPRN_SPRG_SCRATCH0
+
+	ld	r9, (PACA_EXMC+EX_R9)(r13)
+	ld	r10, (PACA_EXMC+EX_R10)(r13)
+	ld	r12, (PACA_EXMC+EX_R12)(r13)
+
+	lwz	r11, (PACA_EXMC+EX_CCR)(r13)
+	mtcr	r11
+
+	ld	r11, (PACA_EXMC+EX_R3)(r13)
+	mtxer	r11
+
+	ld	r11, (PACA_EXMC+EX_R11)(r13)
+	ld	r13, (PACA_EXMC+EX_R13)(r13)
+
+	RFI
+kvmppc_handler_trampoline_enter_end:
+
+
+
+/******************************************************************************
+ *                                                                            *
+ *                               Exit code                                    *
+ *                                                                            *
+ *****************************************************************************/
+
+.global kvmppc_handler_trampoline_exit
+kvmppc_handler_trampoline_exit:
+
+	/* Register usage at this point:
+	 *
+	 * SPRG_SCRATCH0 = guest R13
+	 * R01           = host R1
+	 * R02           = host R2
+	 * R10           = guest PC
+	 * R11           = guest MSR
+	 * R12           = exit handler id
+	 * R13           = PACA
+	 * PACA.exmc.CCR  = guest CR
+	 * PACA.exmc.R9  = guest R1
+	 * PACA.exmc.R10 = guest R10
+	 * PACA.exmc.R11 = guest R11
+	 * PACA.exmc.R12 = guest R12
+	 * PACA.exmc.R13 = guest R2
+	 *
+	 */
+
+	/* Save registers */
+
+	std	r0, (PACA_EXMC+EX_SRR0)(r13)
+	std	r9, (PACA_EXMC+EX_R3)(r13)
+	std	r10, (PACA_EXMC+EX_LR)(r13)
+	std	r11, (PACA_EXMC+EX_DAR)(r13)
+
+	/*
+	 * In order for us to easily get the last instruction,
+	 * we got the #vmexit at, we exploit the fact that the
+	 * virtual layout is still the same here, so we can just
+	 * ld from the guest's PC address
+	 */
+
+	/* We only load the last instruction when it's safe */
+	cmpwi	r12, BOOK3S_INTERRUPT_DATA_STORAGE
+	beq	ld_last_inst
+	cmpwi	r12, BOOK3S_INTERRUPT_PROGRAM
+	beq	ld_last_inst
+
+	b	no_ld_last_inst
+
+ld_last_inst:
+	/* Save off the guest instruction we're at */
+	/*    1) enable paging for data */
+	mfmsr	r9
+	ori	r11, r9, MSR_DR			/* Enable paging for data */
+	mtmsr	r11
+	/*    2) fetch the instruction */
+	lwz	r0, 0(r10)
+	/*    3) disable paging again */
+	mtmsr	r9
+
+no_ld_last_inst:
+
+	/* Restore bolted entries from the shadow and fix it along the way */
+
+	/* We don't store anything in entry 0, so we don't need to take care of that */
+	slbia
+	isync
+
+#if SLB_NUM_BOLTED = 3
+
+	ld	r11, PACA_SLBSHADOWPTR(r13)
+
+	ld	r10, 0x10(r11)
+	cmpdi	r10, 0
+	beq	slb_exit_skip_1
+	oris	r10, r10, SLB_ESID_V@h
+	ld	r9, 0x18(r11)
+	slbmte	r9, r10
+	std	r10, 0x10(r11)
+slb_exit_skip_1:
+	
+	ld	r10, 0x20(r11)
+	cmpdi	r10, 0
+	beq	slb_exit_skip_2
+	oris	r10, r10, SLB_ESID_V@h
+	ld	r9, 0x28(r11)
+	slbmte	r9, r10
+	std	r10, 0x20(r11)
+slb_exit_skip_2:
+	
+	ld	r10, 0x30(r11)
+	cmpdi	r10, 0
+	beq	slb_exit_skip_3
+	oris	r10, r10, SLB_ESID_V@h
+	ld	r9, 0x38(r11)
+	slbmte	r9, r10
+	std	r10, 0x30(r11)
+slb_exit_skip_3:
+	
+#else
+#error unknown number of bolted entries
+#endif
+
+slb_do_exit:
+
+	/* Restore registers */
+
+	ld	r11, (PACA_EXMC+EX_DAR)(r13)
+	ld	r10, (PACA_EXMC+EX_LR)(r13)
+	ld	r9, (PACA_EXMC+EX_R3)(r13)
+
+	/* Save last inst */
+	stw	r0, (PACA_EXMC+EX_LR)(r13)
+
+	/* Save DAR and DSISR before going to paged mode */
+	mfdar	r0
+	std	r0, (PACA_EXMC+EX_DAR)(r13)
+	mfdsisr	r0
+	stw	r0, (PACA_EXMC+EX_DSISR)(r13)
+
+	/* RFI into the highmem handler */
+	mfmsr	r0
+	ori	r0, r0, MSR_IR|MSR_DR|MSR_RI	/* Enable paging */
+	mtsrr1	r0
+	ld	r0, PACASAVEDMSR(r13)		/* Highmem handler address */
+	mtsrr0	r0
+
+	mfspr	r0, SPRN_SPRG_SCRATCH0
+
+	RFI
+kvmppc_handler_trampoline_exit_end:
+
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 09/27] Add interrupt handling code
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47     ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

Getting from host state to the guest is only half the story. We also need
to return to our host context and handle whatever happened to get us out of
the guest.

On PowerPC every guest exit is an interrupt. So all we need to do is trap
the host's interrupt handlers and get into our #VMEXIT code to handle it.

PowerPCs also have a register that can add an offset to the interrupt handlers'
adresses which is what the booke KVM code uses. Unfortunately that is a
hypervisor ressource and we also want to be able to run KVM when we're running
in an LPAR. So we have to hook into the Linux interrupt handlers.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>

---

v3 -> v4:

  - header rename fix
---
 arch/powerpc/kvm/book3s_64_rmhandlers.S |  131 +++++++++++++++++++++++++++++++
 1 files changed, 131 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_rmhandlers.S

diff --git a/arch/powerpc/kvm/book3s_64_rmhandlers.S b/arch/powerpc/kvm/book3s_64_rmhandlers.S
new file mode 100644
index 0000000..fb7dd2e
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_rmhandlers.S
@@ -0,0 +1,131 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
+ */
+
+#include <asm/ppc_asm.h>
+#include <asm/kvm_asm.h>
+#include <asm/reg.h>
+#include <asm/page.h>
+#include <asm/asm-offsets.h>
+#include <asm/exception-64s.h>
+
+/*****************************************************************************
+ *                                                                           *
+ *        Real Mode handlers that need to be in low physical memory          *
+ *                                                                           *
+ ****************************************************************************/
+
+
+.macro INTERRUPT_TRAMPOLINE intno
+
+.global kvmppc_trampoline_\intno
+kvmppc_trampoline_\intno:
+
+	mtspr	SPRN_SPRG_SCRATCH0, r13		/* Save r13 */
+
+	/*
+	 * First thing to do is to find out if we're coming
+	 * from a KVM guest or a Linux process.
+	 *
+	 * To distinguish, we check a magic byte in the PACA
+	 */
+	mfspr	r13, SPRN_SPRG_PACA		/* r13 = PACA */
+	std	r12, (PACA_EXMC + EX_R12)(r13)
+	mfcr	r12
+	stw	r12, (PACA_EXMC + EX_CCR)(r13)
+	lbz	r12, PACA_KVM_IN_GUEST(r13)
+	cmpwi	r12, 0
+	bne	..kvmppc_handler_hasmagic_\intno
+	/* No KVM guest? Then jump back to the Linux handler! */
+	lwz	r12, (PACA_EXMC + EX_CCR)(r13)
+	mtcr	r12
+	ld	r12, (PACA_EXMC + EX_R12)(r13)
+	mfspr	r13, SPRN_SPRG_SCRATCH0		/* r13 = original r13 */
+	b	kvmppc_resume_\intno		/* Get back original handler */
+
+	/* Now we know we're handling a KVM guest */
+..kvmppc_handler_hasmagic_\intno:
+	/* Unset guest state */
+	li	r12, 0
+	stb	r12, PACA_KVM_IN_GUEST(r13)
+
+	std	r1, (PACA_EXMC+EX_R9)(r13)
+	std	r10, (PACA_EXMC+EX_R10)(r13)
+	std	r11, (PACA_EXMC+EX_R11)(r13)
+	std	r2, (PACA_EXMC+EX_R13)(r13)
+
+	mfsrr0	r10
+	mfsrr1	r11
+
+	/* Restore R1/R2 so we can handle faults */
+	ld	r1, PACAR1(r13)
+	ld	r2, (PACA_EXMC+EX_SRR0)(r13)
+
+	/* Let's store which interrupt we're handling */
+	li	r12, \intno
+
+	/* Jump into the SLB exit code that goes to the highmem handler */
+	b	kvmppc_handler_trampoline_exit
+
+.endm
+
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_SYSTEM_RESET
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_MACHINE_CHECK
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_DATA_STORAGE
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_DATA_SEGMENT
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_INST_STORAGE
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_INST_SEGMENT
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_EXTERNAL
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_ALIGNMENT
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_PROGRAM
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_FP_UNAVAIL
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_DECREMENTER
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_SYSCALL
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_TRACE
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_PERFMON
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_ALTIVEC
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_VSX
+
+/*
+ * This trampoline brings us back to a real mode handler
+ *
+ * Input Registers:
+ *
+ * R6 = SRR0
+ * R7 = SRR1
+ * LR = real-mode IP
+ *
+ */
+.global kvmppc_handler_lowmem_trampoline
+kvmppc_handler_lowmem_trampoline:
+
+	mtsrr0	r6
+	mtsrr1	r7
+	blr
+kvmppc_handler_lowmem_trampoline_end:
+
+.global kvmppc_trampoline_lowmem
+kvmppc_trampoline_lowmem:
+	.long kvmppc_handler_lowmem_trampoline - _stext
+
+.global kvmppc_trampoline_enter
+kvmppc_trampoline_enter:
+	.long kvmppc_handler_trampoline_enter - _stext
+
+#include "book3s_64_slb.S"
+
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 09/27] Add interrupt handling code
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

Getting from host state to the guest is only half the story. We also need
to return to our host context and handle whatever happened to get us out of
the guest.

On PowerPC every guest exit is an interrupt. So all we need to do is trap
the host's interrupt handlers and get into our #VMEXIT code to handle it.

PowerPCs also have a register that can add an offset to the interrupt handlers'
adresses which is what the booke KVM code uses. Unfortunately that is a
hypervisor ressource and we also want to be able to run KVM when we're running
in an LPAR. So we have to hook into the Linux interrupt handlers.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - header rename fix
---
 arch/powerpc/kvm/book3s_64_rmhandlers.S |  131 +++++++++++++++++++++++++++++++
 1 files changed, 131 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_rmhandlers.S

diff --git a/arch/powerpc/kvm/book3s_64_rmhandlers.S b/arch/powerpc/kvm/book3s_64_rmhandlers.S
new file mode 100644
index 0000000..fb7dd2e
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_rmhandlers.S
@@ -0,0 +1,131 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <asm/ppc_asm.h>
+#include <asm/kvm_asm.h>
+#include <asm/reg.h>
+#include <asm/page.h>
+#include <asm/asm-offsets.h>
+#include <asm/exception-64s.h>
+
+/*****************************************************************************
+ *                                                                           *
+ *        Real Mode handlers that need to be in low physical memory          *
+ *                                                                           *
+ ****************************************************************************/
+
+
+.macro INTERRUPT_TRAMPOLINE intno
+
+.global kvmppc_trampoline_\intno
+kvmppc_trampoline_\intno:
+
+	mtspr	SPRN_SPRG_SCRATCH0, r13		/* Save r13 */
+
+	/*
+	 * First thing to do is to find out if we're coming
+	 * from a KVM guest or a Linux process.
+	 *
+	 * To distinguish, we check a magic byte in the PACA
+	 */
+	mfspr	r13, SPRN_SPRG_PACA		/* r13 = PACA */
+	std	r12, (PACA_EXMC + EX_R12)(r13)
+	mfcr	r12
+	stw	r12, (PACA_EXMC + EX_CCR)(r13)
+	lbz	r12, PACA_KVM_IN_GUEST(r13)
+	cmpwi	r12, 0
+	bne	..kvmppc_handler_hasmagic_\intno
+	/* No KVM guest? Then jump back to the Linux handler! */
+	lwz	r12, (PACA_EXMC + EX_CCR)(r13)
+	mtcr	r12
+	ld	r12, (PACA_EXMC + EX_R12)(r13)
+	mfspr	r13, SPRN_SPRG_SCRATCH0		/* r13 = original r13 */
+	b	kvmppc_resume_\intno		/* Get back original handler */
+
+	/* Now we know we're handling a KVM guest */
+..kvmppc_handler_hasmagic_\intno:
+	/* Unset guest state */
+	li	r12, 0
+	stb	r12, PACA_KVM_IN_GUEST(r13)
+
+	std	r1, (PACA_EXMC+EX_R9)(r13)
+	std	r10, (PACA_EXMC+EX_R10)(r13)
+	std	r11, (PACA_EXMC+EX_R11)(r13)
+	std	r2, (PACA_EXMC+EX_R13)(r13)
+
+	mfsrr0	r10
+	mfsrr1	r11
+
+	/* Restore R1/R2 so we can handle faults */
+	ld	r1, PACAR1(r13)
+	ld	r2, (PACA_EXMC+EX_SRR0)(r13)
+
+	/* Let's store which interrupt we're handling */
+	li	r12, \intno
+
+	/* Jump into the SLB exit code that goes to the highmem handler */
+	b	kvmppc_handler_trampoline_exit
+
+.endm
+
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_SYSTEM_RESET
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_MACHINE_CHECK
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_DATA_STORAGE
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_DATA_SEGMENT
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_INST_STORAGE
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_INST_SEGMENT
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_EXTERNAL
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_ALIGNMENT
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_PROGRAM
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_FP_UNAVAIL
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_DECREMENTER
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_SYSCALL
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_TRACE
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_PERFMON
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_ALTIVEC
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_VSX
+
+/*
+ * This trampoline brings us back to a real mode handler
+ *
+ * Input Registers:
+ *
+ * R6 = SRR0
+ * R7 = SRR1
+ * LR = real-mode IP
+ *
+ */
+.global kvmppc_handler_lowmem_trampoline
+kvmppc_handler_lowmem_trampoline:
+
+	mtsrr0	r6
+	mtsrr1	r7
+	blr
+kvmppc_handler_lowmem_trampoline_end:
+
+.global kvmppc_trampoline_lowmem
+kvmppc_trampoline_lowmem:
+	.long kvmppc_handler_lowmem_trampoline - _stext
+
+.global kvmppc_trampoline_enter
+kvmppc_trampoline_enter:
+	.long kvmppc_handler_trampoline_enter - _stext
+
+#include "book3s_64_slb.S"
+
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 09/27] Add interrupt handling code
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

Getting from host state to the guest is only half the story. We also need
to return to our host context and handle whatever happened to get us out of
the guest.

On PowerPC every guest exit is an interrupt. So all we need to do is trap
the host's interrupt handlers and get into our #VMEXIT code to handle it.

PowerPCs also have a register that can add an offset to the interrupt handlers'
adresses which is what the booke KVM code uses. Unfortunately that is a
hypervisor ressource and we also want to be able to run KVM when we're running
in an LPAR. So we have to hook into the Linux interrupt handlers.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - header rename fix
---
 arch/powerpc/kvm/book3s_64_rmhandlers.S |  131 +++++++++++++++++++++++++++++++
 1 files changed, 131 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_rmhandlers.S

diff --git a/arch/powerpc/kvm/book3s_64_rmhandlers.S b/arch/powerpc/kvm/book3s_64_rmhandlers.S
new file mode 100644
index 0000000..fb7dd2e
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_rmhandlers.S
@@ -0,0 +1,131 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <asm/ppc_asm.h>
+#include <asm/kvm_asm.h>
+#include <asm/reg.h>
+#include <asm/page.h>
+#include <asm/asm-offsets.h>
+#include <asm/exception-64s.h>
+
+/*****************************************************************************
+ *                                                                           *
+ *        Real Mode handlers that need to be in low physical memory          *
+ *                                                                           *
+ ****************************************************************************/
+
+
+.macro INTERRUPT_TRAMPOLINE intno
+
+.global kvmppc_trampoline_\intno
+kvmppc_trampoline_\intno:
+
+	mtspr	SPRN_SPRG_SCRATCH0, r13		/* Save r13 */
+
+	/*
+	 * First thing to do is to find out if we're coming
+	 * from a KVM guest or a Linux process.
+	 *
+	 * To distinguish, we check a magic byte in the PACA
+	 */
+	mfspr	r13, SPRN_SPRG_PACA		/* r13 = PACA */
+	std	r12, (PACA_EXMC + EX_R12)(r13)
+	mfcr	r12
+	stw	r12, (PACA_EXMC + EX_CCR)(r13)
+	lbz	r12, PACA_KVM_IN_GUEST(r13)
+	cmpwi	r12, 0
+	bne	..kvmppc_handler_hasmagic_\intno
+	/* No KVM guest? Then jump back to the Linux handler! */
+	lwz	r12, (PACA_EXMC + EX_CCR)(r13)
+	mtcr	r12
+	ld	r12, (PACA_EXMC + EX_R12)(r13)
+	mfspr	r13, SPRN_SPRG_SCRATCH0		/* r13 = original r13 */
+	b	kvmppc_resume_\intno		/* Get back original handler */
+
+	/* Now we know we're handling a KVM guest */
+..kvmppc_handler_hasmagic_\intno:
+	/* Unset guest state */
+	li	r12, 0
+	stb	r12, PACA_KVM_IN_GUEST(r13)
+
+	std	r1, (PACA_EXMC+EX_R9)(r13)
+	std	r10, (PACA_EXMC+EX_R10)(r13)
+	std	r11, (PACA_EXMC+EX_R11)(r13)
+	std	r2, (PACA_EXMC+EX_R13)(r13)
+
+	mfsrr0	r10
+	mfsrr1	r11
+
+	/* Restore R1/R2 so we can handle faults */
+	ld	r1, PACAR1(r13)
+	ld	r2, (PACA_EXMC+EX_SRR0)(r13)
+
+	/* Let's store which interrupt we're handling */
+	li	r12, \intno
+
+	/* Jump into the SLB exit code that goes to the highmem handler */
+	b	kvmppc_handler_trampoline_exit
+
+.endm
+
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_SYSTEM_RESET
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_MACHINE_CHECK
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_DATA_STORAGE
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_DATA_SEGMENT
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_INST_STORAGE
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_INST_SEGMENT
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_EXTERNAL
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_ALIGNMENT
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_PROGRAM
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_FP_UNAVAIL
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_DECREMENTER
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_SYSCALL
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_TRACE
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_PERFMON
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_ALTIVEC
+INTERRUPT_TRAMPOLINE	BOOK3S_INTERRUPT_VSX
+
+/*
+ * This trampoline brings us back to a real mode handler
+ *
+ * Input Registers:
+ *
+ * R6 = SRR0
+ * R7 = SRR1
+ * LR = real-mode IP
+ *
+ */
+.global kvmppc_handler_lowmem_trampoline
+kvmppc_handler_lowmem_trampoline:
+
+	mtsrr0	r6
+	mtsrr1	r7
+	blr
+kvmppc_handler_lowmem_trampoline_end:
+
+.global kvmppc_trampoline_lowmem
+kvmppc_trampoline_lowmem:
+	.long kvmppc_handler_lowmem_trampoline - _stext
+
+.global kvmppc_trampoline_enter
+kvmppc_trampoline_enter:
+	.long kvmppc_handler_trampoline_enter - _stext
+
+#include "book3s_64_slb.S"
+
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 10/27] Add book3s.c
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47     ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

This adds the book3s core handling file. Here everything that is generic to
desktop PowerPC cores is handled, including interrupt injections, MSR settings,
etc.

It basically takes over the same role as booke.c for embedded PowerPCs.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>

---

v3 -> v4:

  - use context_id instead of mm_alloc

v4 -> v5:

  - make pvr 32 bits

v5 -> v6:

  - use /* */ instead of //
  - use kvm_for_each_vcpu
---
 arch/powerpc/kvm/book3s.c |  925 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 925 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s.c

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
new file mode 100644
index 0000000..42037d4
--- /dev/null
+++ b/arch/powerpc/kvm/book3s.c
@@ -0,0 +1,925 @@
+/*
+ * Copyright (C) 2009. SUSE Linux Products GmbH. All rights reserved.
+ *
+ * Authors:
+ *    Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
+ *    Kevin Wolf <mail-vbj5DHeKsUHgbcAU4aOf7A@public.gmane.org>
+ *
+ * Description:
+ * This file is derived from arch/powerpc/kvm/44x.c,
+ * by Hollis Blanchard <hollisb-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kvm_host.h>
+#include <linux/err.h>
+
+#include <asm/reg.h>
+#include <asm/cputable.h>
+#include <asm/cacheflush.h>
+#include <asm/tlbflush.h>
+#include <asm/uaccess.h>
+#include <asm/io.h>
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+#include <asm/mmu_context.h>
+#include <linux/sched.h>
+#include <linux/vmalloc.h>
+
+#define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
+
+/* #define EXIT_DEBUG */
+/* #define EXIT_DEBUG_SIMPLE */
+
+/* Without AGGRESSIVE_DEC we only fire off a DEC interrupt when DEC turns 0.
+ * When set, we retrigger a DEC interrupt after that if DEC <= 0.
+ * PPC32 Linux runs faster without AGGRESSIVE_DEC, PPC64 Linux requires it. */
+
+/* #define AGGRESSIVE_DEC */
+
+struct kvm_stats_debugfs_item debugfs_entries[] = {
+	{ "exits",       VCPU_STAT(sum_exits) },
+	{ "mmio",        VCPU_STAT(mmio_exits) },
+	{ "sig",         VCPU_STAT(signal_exits) },
+	{ "sysc",        VCPU_STAT(syscall_exits) },
+	{ "inst_emu",    VCPU_STAT(emulated_inst_exits) },
+	{ "dec",         VCPU_STAT(dec_exits) },
+	{ "ext_intr",    VCPU_STAT(ext_intr_exits) },
+	{ "queue_intr",  VCPU_STAT(queue_intr) },
+	{ "halt_wakeup", VCPU_STAT(halt_wakeup) },
+	{ "pf_storage",  VCPU_STAT(pf_storage) },
+	{ "sp_storage",  VCPU_STAT(sp_storage) },
+	{ "pf_instruc",  VCPU_STAT(pf_instruc) },
+	{ "sp_instruc",  VCPU_STAT(sp_instruc) },
+	{ "ld",          VCPU_STAT(ld) },
+	{ "ld_slow",     VCPU_STAT(ld_slow) },
+	{ "st",          VCPU_STAT(st) },
+	{ "st_slow",     VCPU_STAT(st_slow) },
+	{ NULL }
+};
+
+void kvmppc_core_load_host_debugstate(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvmppc_core_load_guest_debugstate(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+	memcpy(get_paca()->kvm_slb, to_book3s(vcpu)->slb_shadow, sizeof(get_paca()->kvm_slb));
+	get_paca()->kvm_slb_max = to_book3s(vcpu)->slb_shadow_max;
+}
+
+void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	memcpy(to_book3s(vcpu)->slb_shadow, get_paca()->kvm_slb, sizeof(get_paca()->kvm_slb));
+	to_book3s(vcpu)->slb_shadow_max = get_paca()->kvm_slb_max;
+}
+
+#if defined(AGGRESSIVE_DEC) || defined(EXIT_DEBUG)
+static u32 kvmppc_get_dec(struct kvm_vcpu *vcpu)
+{
+	u64 jd = mftb() - vcpu->arch.dec_jiffies;
+	return vcpu->arch.dec - jd;
+}
+#endif
+
+void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
+{
+	ulong old_msr = vcpu->arch.msr;
+
+#ifdef EXIT_DEBUG
+	printk(KERN_INFO "KVM: Set MSR to 0x%llx\n", msr);
+#endif
+	msr &= to_book3s(vcpu)->msr_mask;
+	vcpu->arch.msr = msr;
+	vcpu->arch.shadow_msr = msr | MSR_USER32;
+	vcpu->arch.shadow_msr &= ( MSR_VEC | MSR_VSX | MSR_FP | MSR_FE0 |
+				   MSR_USER64 | MSR_SE | MSR_BE | MSR_DE |
+				   MSR_FE1);
+
+	if (msr & (MSR_WE|MSR_POW)) {
+		if (!vcpu->arch.pending_exceptions) {
+			kvm_vcpu_block(vcpu);
+			vcpu->stat.halt_wakeup++;
+		}
+	}
+
+	if (((vcpu->arch.msr & (MSR_IR|MSR_DR)) != (old_msr & (MSR_IR|MSR_DR))) ||
+	    (vcpu->arch.msr & MSR_PR) != (old_msr & MSR_PR)) {
+		kvmppc_mmu_flush_segments(vcpu);
+		kvmppc_mmu_map_segment(vcpu, vcpu->arch.pc);
+	}
+}
+
+void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags)
+{
+	vcpu->arch.srr0 = vcpu->arch.pc;
+	vcpu->arch.srr1 = vcpu->arch.msr | flags;
+	vcpu->arch.pc = to_book3s(vcpu)->hior + vec;
+	vcpu->arch.mmu.reset_msr(vcpu);
+}
+
+void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec)
+{
+	unsigned int prio;
+
+	vcpu->stat.queue_intr++;
+	switch (vec) {
+	case 0x100: prio = BOOK3S_IRQPRIO_SYSTEM_RESET;		break;
+	case 0x200: prio = BOOK3S_IRQPRIO_MACHINE_CHECK;	break;
+	case 0x300: prio = BOOK3S_IRQPRIO_DATA_STORAGE;		break;
+	case 0x380: prio = BOOK3S_IRQPRIO_DATA_SEGMENT;		break;
+	case 0x400: prio = BOOK3S_IRQPRIO_INST_STORAGE;		break;
+	case 0x480: prio = BOOK3S_IRQPRIO_INST_SEGMENT;		break;
+	case 0x500: prio = BOOK3S_IRQPRIO_EXTERNAL;		break;
+	case 0x600: prio = BOOK3S_IRQPRIO_ALIGNMENT;		break;
+	case 0x700: prio = BOOK3S_IRQPRIO_PROGRAM;		break;
+	case 0x800: prio = BOOK3S_IRQPRIO_FP_UNAVAIL;		break;
+	case 0x900: prio = BOOK3S_IRQPRIO_DECREMENTER;		break;
+	case 0xc00: prio = BOOK3S_IRQPRIO_SYSCALL;		break;
+	case 0xd00: prio = BOOK3S_IRQPRIO_DEBUG;		break;
+	case 0xf20: prio = BOOK3S_IRQPRIO_ALTIVEC;		break;
+	case 0xf40: prio = BOOK3S_IRQPRIO_VSX;			break;
+	default:    prio = BOOK3S_IRQPRIO_MAX;			break;
+	}
+
+	set_bit(prio, &vcpu->arch.pending_exceptions);
+#ifdef EXIT_DEBUG
+	printk(KERN_INFO "Queueing interrupt %x\n", vec);
+#endif
+}
+
+
+void kvmppc_core_queue_program(struct kvm_vcpu *vcpu)
+{
+	kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_PROGRAM);
+}
+
+void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu)
+{
+	kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_DECREMENTER);
+}
+
+int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu)
+{
+	return test_bit(BOOK3S_INTERRUPT_DECREMENTER >> 7, &vcpu->arch.pending_exceptions);
+}
+
+void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
+                                struct kvm_interrupt *irq)
+{
+	kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_EXTERNAL);
+}
+
+int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority)
+{
+	int deliver = 1;
+	int vec = 0;
+
+	switch (priority) {
+	case BOOK3S_IRQPRIO_DECREMENTER:
+		deliver = vcpu->arch.msr & MSR_EE;
+		vec = BOOK3S_INTERRUPT_DECREMENTER;
+		break;
+	case BOOK3S_IRQPRIO_EXTERNAL:
+		deliver = vcpu->arch.msr & MSR_EE;
+		vec = BOOK3S_INTERRUPT_EXTERNAL;
+		break;
+	case BOOK3S_IRQPRIO_SYSTEM_RESET:
+		vec = BOOK3S_INTERRUPT_SYSTEM_RESET;
+		break;
+	case BOOK3S_IRQPRIO_MACHINE_CHECK:
+		vec = BOOK3S_INTERRUPT_MACHINE_CHECK;
+		break;
+	case BOOK3S_IRQPRIO_DATA_STORAGE:
+		vec = BOOK3S_INTERRUPT_DATA_STORAGE;
+		break;
+	case BOOK3S_IRQPRIO_INST_STORAGE:
+		vec = BOOK3S_INTERRUPT_INST_STORAGE;
+		break;
+	case BOOK3S_IRQPRIO_DATA_SEGMENT:
+		vec = BOOK3S_INTERRUPT_DATA_SEGMENT;
+		break;
+	case BOOK3S_IRQPRIO_INST_SEGMENT:
+		vec = BOOK3S_INTERRUPT_INST_SEGMENT;
+		break;
+	case BOOK3S_IRQPRIO_ALIGNMENT:
+		vec = BOOK3S_INTERRUPT_ALIGNMENT;
+		break;
+	case BOOK3S_IRQPRIO_PROGRAM:
+		vec = BOOK3S_INTERRUPT_PROGRAM;
+		break;
+	case BOOK3S_IRQPRIO_VSX:
+		vec = BOOK3S_INTERRUPT_VSX;
+		break;
+	case BOOK3S_IRQPRIO_ALTIVEC:
+		vec = BOOK3S_INTERRUPT_ALTIVEC;
+		break;
+	case BOOK3S_IRQPRIO_FP_UNAVAIL:
+		vec = BOOK3S_INTERRUPT_FP_UNAVAIL;
+		break;
+	case BOOK3S_IRQPRIO_SYSCALL:
+		vec = BOOK3S_INTERRUPT_SYSCALL;
+		break;
+	case BOOK3S_IRQPRIO_DEBUG:
+		vec = BOOK3S_INTERRUPT_TRACE;
+		break;
+	case BOOK3S_IRQPRIO_PERFORMANCE_MONITOR:
+		vec = BOOK3S_INTERRUPT_PERFMON;
+		break;
+	default:
+		deliver = 0;
+		printk(KERN_ERR "KVM: Unknown interrupt: 0x%x\n", priority);
+		break;
+	}
+
+#if 0
+	printk(KERN_INFO "Deliver interrupt 0x%x? %x\n", vec, deliver);
+#endif
+
+	if (deliver)
+		kvmppc_inject_interrupt(vcpu, vec, 0ULL);
+
+	return deliver;
+}
+
+void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
+{
+	unsigned long *pending = &vcpu->arch.pending_exceptions;
+	unsigned int priority;
+
+	/* XXX be more clever here - no need to mftb() on every entry */
+	/* Issue DEC again if it's still active */
+#ifdef AGGRESSIVE_DEC
+	if (vcpu->arch.msr & MSR_EE)
+		if (kvmppc_get_dec(vcpu) & 0x80000000)
+			kvmppc_core_queue_dec(vcpu);
+#endif
+
+#ifdef EXIT_DEBUG
+	if (vcpu->arch.pending_exceptions)
+		printk(KERN_EMERG "KVM: Check pending: %lx\n", vcpu->arch.pending_exceptions);
+#endif
+	priority = __ffs(*pending);
+	while (priority <= (sizeof(unsigned int) * 8)) {
+		if (kvmppc_book3s_irqprio_deliver(vcpu, priority)) {
+			clear_bit(priority, &vcpu->arch.pending_exceptions);
+			break;
+		}
+
+		priority = find_next_bit(pending,
+					 BITS_PER_BYTE * sizeof(*pending),
+					 priority + 1);
+	}
+}
+
+void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr)
+{
+	vcpu->arch.pvr = pvr;
+	if ((pvr >= 0x330000) && (pvr < 0x70330000)) {
+		kvmppc_mmu_book3s_64_init(vcpu);
+		to_book3s(vcpu)->hior = 0xfff00000;
+		to_book3s(vcpu)->msr_mask = 0xffffffffffffffffULL;
+	} else {
+		kvmppc_mmu_book3s_32_init(vcpu);
+		to_book3s(vcpu)->hior = 0;
+		to_book3s(vcpu)->msr_mask = 0xffffffffULL;
+	}
+
+	/* If we are in hypervisor level on 970, we can tell the CPU to
+	 * treat DCBZ as 32 bytes store */
+	vcpu->arch.hflags &= ~BOOK3S_HFLAG_DCBZ32;
+	if (vcpu->arch.mmu.is_dcbz32(vcpu) && (mfmsr() & MSR_HV) &&
+	    !strcmp(cur_cpu_spec->platform, "ppc970"))
+		vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
+
+}
+
+/* Book3s_32 CPUs always have 32 bytes cache line size, which Linux assumes. To
+ * make Book3s_32 Linux work on Book3s_64, we have to make sure we trap dcbz to
+ * emulate 32 bytes dcbz length.
+ *
+ * The Book3s_64 inventors also realized this case and implemented a special bit
+ * in the HID5 register, which is a hypervisor ressource. Thus we can't use it.
+ *
+ * My approach here is to patch the dcbz instruction on executing pages.
+ */
+static void kvmppc_patch_dcbz(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte)
+{
+	bool touched = false;
+	hva_t hpage;
+	u32 *page;
+	int i;
+
+	hpage = gfn_to_hva(vcpu->kvm, pte->raddr >> PAGE_SHIFT);
+	if (kvm_is_error_hva(hpage))
+		return;
+
+	hpage |= pte->raddr & ~PAGE_MASK;
+	hpage &= ~0xFFFULL;
+
+	page = vmalloc(HW_PAGE_SIZE);
+
+	if (copy_from_user(page, (void __user *)hpage, HW_PAGE_SIZE))
+		goto out;
+
+	for (i=0; i < HW_PAGE_SIZE / 4; i++)
+		if ((page[i] & 0xff0007ff) == INS_DCBZ) {
+			page[i] &= 0xfffffff7; // reserved instruction, so we trap
+			touched = true;
+		}
+
+	if (touched)
+		copy_to_user((void __user *)hpage, page, HW_PAGE_SIZE);
+
+out:
+	vfree(page);
+}
+
+static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, bool data,
+			 struct kvmppc_pte *pte)
+{
+	int relocated = (vcpu->arch.msr & (data ? MSR_DR : MSR_IR));
+	int r;
+
+	if (relocated) {
+		r = vcpu->arch.mmu.xlate(vcpu, eaddr, pte, data);
+	} else {
+		pte->eaddr = eaddr;
+		pte->raddr = eaddr & 0xffffffff;
+		pte->vpage = eaddr >> 12;
+		switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+		case 0:
+			pte->vpage |= VSID_REAL;
+		case MSR_DR:
+			pte->vpage |= VSID_REAL_DR;
+		case MSR_IR:
+			pte->vpage |= VSID_REAL_IR;
+		}
+		pte->may_read = true;
+		pte->may_write = true;
+		pte->may_execute = true;
+		r = 0;
+	}
+
+	return r;
+}
+
+static hva_t kvmppc_bad_hva(void)
+{
+	return PAGE_OFFSET;
+}
+
+static hva_t kvmppc_pte_to_hva(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte,
+			       bool read)
+{
+	hva_t hpage;
+
+	if (read && !pte->may_read)
+		goto err;
+
+	if (!read && !pte->may_write)
+		goto err;
+
+	hpage = gfn_to_hva(vcpu->kvm, pte->raddr >> PAGE_SHIFT);
+	if (kvm_is_error_hva(hpage))
+		goto err;
+
+	return hpage | (pte->raddr & ~PAGE_MASK);
+err:
+	return kvmppc_bad_hva();
+}
+
+int kvmppc_st(struct kvm_vcpu *vcpu, ulong eaddr, int size, void *ptr)
+{
+	struct kvmppc_pte pte;
+	hva_t hva = eaddr;
+
+	vcpu->stat.st++;
+
+	if (kvmppc_xlate(vcpu, eaddr, false, &pte))
+		goto err;
+
+	hva = kvmppc_pte_to_hva(vcpu, &pte, false);
+	if (kvm_is_error_hva(hva))
+		goto err;
+
+	if (copy_to_user((void __user *)hva, ptr, size)) {
+		printk(KERN_INFO "kvmppc_st at 0x%lx failed\n", hva);
+		goto err;
+	}
+
+	return 0;
+
+err:
+	return -ENOENT;
+}
+
+int kvmppc_ld(struct kvm_vcpu *vcpu, ulong eaddr, int size, void *ptr,
+		      bool data)
+{
+	struct kvmppc_pte pte;
+	hva_t hva = eaddr;
+
+	vcpu->stat.ld++;
+
+	if (kvmppc_xlate(vcpu, eaddr, data, &pte))
+		goto err;
+
+	hva = kvmppc_pte_to_hva(vcpu, &pte, true);
+	if (kvm_is_error_hva(hva))
+		goto err;
+
+	if (copy_from_user(ptr, (void __user *)hva, size)) {
+		printk(KERN_INFO "kvmppc_ld at 0x%lx failed\n", hva);
+		goto err;
+	}
+
+	return 0;
+
+err:
+	return -ENOENT;
+}
+
+static int kvmppc_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+	return kvm_is_visible_gfn(vcpu->kvm, gfn);
+}
+
+int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu,
+			    ulong eaddr, int vec)
+{
+	bool data = (vec == BOOK3S_INTERRUPT_DATA_STORAGE);
+	int r = RESUME_GUEST;
+	int relocated;
+	int page_found = 0;
+	struct kvmppc_pte pte;
+	bool is_mmio = false;
+
+	if ( vec == BOOK3S_INTERRUPT_DATA_STORAGE ) {
+		relocated = (vcpu->arch.msr & MSR_DR);
+	} else {
+		relocated = (vcpu->arch.msr & MSR_IR);
+	}
+
+	/* Resolve real address if translation turned on */
+	if (relocated) {
+		page_found = vcpu->arch.mmu.xlate(vcpu, eaddr, &pte, data);
+	} else {
+		pte.may_execute = true;
+		pte.may_read = true;
+		pte.may_write = true;
+		pte.raddr = eaddr & 0xffffffff;
+		pte.eaddr = eaddr;
+		pte.vpage = eaddr >> 12;
+		switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+		case 0:
+			pte.vpage |= VSID_REAL;
+		case MSR_DR:
+			pte.vpage |= VSID_REAL_DR;
+		case MSR_IR:
+			pte.vpage |= VSID_REAL_IR;
+		}
+	}
+
+	if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+	   (!(vcpu->arch.hflags & BOOK3S_HFLAG_DCBZ32))) {
+		/*
+		 * If we do the dcbz hack, we have to NX on every execution,
+		 * so we can patch the executing code. This renders our guest
+		 * NX-less.
+		 */
+		pte.may_execute = !data;
+	}
+
+	if (page_found == -ENOENT) {
+		/* Page not found in guest PTE entries */
+		vcpu->arch.dear = vcpu->arch.fault_dear;
+		to_book3s(vcpu)->dsisr = vcpu->arch.fault_dsisr;
+		vcpu->arch.msr |= (vcpu->arch.shadow_msr & 0x00000000f8000000ULL);
+		kvmppc_book3s_queue_irqprio(vcpu, vec);
+	} else if (page_found == -EPERM) {
+		/* Storage protection */
+		vcpu->arch.dear = vcpu->arch.fault_dear;
+		to_book3s(vcpu)->dsisr = vcpu->arch.fault_dsisr & ~DSISR_NOHPTE;
+		to_book3s(vcpu)->dsisr |= DSISR_PROTFAULT;
+		vcpu->arch.msr |= (vcpu->arch.shadow_msr & 0x00000000f8000000ULL);
+		kvmppc_book3s_queue_irqprio(vcpu, vec);
+	} else if (page_found == -EINVAL) {
+		/* Page not found in guest SLB */
+		vcpu->arch.dear = vcpu->arch.fault_dear;
+		kvmppc_book3s_queue_irqprio(vcpu, vec + 0x80);
+	} else if (!is_mmio &&
+		   kvmppc_visible_gfn(vcpu, pte.raddr >> PAGE_SHIFT)) {
+		/* The guest's PTE is not mapped yet. Map on the host */
+		kvmppc_mmu_map_page(vcpu, &pte);
+		if (data)
+			vcpu->stat.sp_storage++;
+		else if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+			(!(vcpu->arch.hflags & BOOK3S_HFLAG_DCBZ32)))
+			kvmppc_patch_dcbz(vcpu, &pte);
+	} else {
+		/* MMIO */
+		vcpu->stat.mmio_exits++;
+		vcpu->arch.paddr_accessed = pte.raddr;
+		r = kvmppc_emulate_mmio(run, vcpu);
+		if ( r == RESUME_HOST_NV )
+			r = RESUME_HOST;
+		if ( r == RESUME_GUEST_NV )
+			r = RESUME_GUEST;
+	}
+
+	return r;
+}
+
+int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
+                       unsigned int exit_nr)
+{
+	int r = RESUME_HOST;
+
+	vcpu->stat.sum_exits++;
+
+	run->exit_reason = KVM_EXIT_UNKNOWN;
+	run->ready_for_interrupt_injection = 1;
+#ifdef EXIT_DEBUG
+	printk(KERN_EMERG "exit_nr=0x%x | pc=0x%lx | dar=0x%lx | dec=0x%x | msr=0x%lx\n",
+		exit_nr, vcpu->arch.pc, vcpu->arch.fault_dear,
+		kvmppc_get_dec(vcpu), vcpu->arch.msr);
+#elif defined (EXIT_DEBUG_SIMPLE)
+	if ((exit_nr != 0x900) && (exit_nr != 0x500))
+		printk(KERN_EMERG "exit_nr=0x%x | pc=0x%lx | dar=0x%lx | msr=0x%lx\n",
+			exit_nr, vcpu->arch.pc, vcpu->arch.fault_dear,
+			vcpu->arch.msr);
+#endif
+	kvm_resched(vcpu);
+	switch (exit_nr) {
+	case BOOK3S_INTERRUPT_INST_STORAGE:
+		vcpu->stat.pf_instruc++;
+		/* only care about PTEG not found errors, but leave NX alone */
+		if (vcpu->arch.shadow_msr & 0x40000000) {
+			r = kvmppc_handle_pagefault(run, vcpu, vcpu->arch.pc, exit_nr);
+			vcpu->stat.sp_instruc++;
+		} else if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+			  (!(vcpu->arch.hflags & BOOK3S_HFLAG_DCBZ32))) {
+			/*
+			 * XXX If we do the dcbz hack we use the NX bit to flush&patch the page,
+			 *     so we can't use the NX bit inside the guest. Let's cross our fingers,
+			 *     that no guest that needs the dcbz hack does NX.
+			 */
+			kvmppc_mmu_pte_flush(vcpu, vcpu->arch.pc, ~0xFFFULL);
+		} else {
+			vcpu->arch.msr |= (vcpu->arch.shadow_msr & 0x58000000);
+			kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+			kvmppc_mmu_pte_flush(vcpu, vcpu->arch.pc, ~0xFFFULL);
+			r = RESUME_GUEST;
+		}
+		break;
+	case BOOK3S_INTERRUPT_DATA_STORAGE:
+		vcpu->stat.pf_storage++;
+		/* The only case we need to handle is missing shadow PTEs */
+		if (vcpu->arch.fault_dsisr & DSISR_NOHPTE) {
+			r = kvmppc_handle_pagefault(run, vcpu, vcpu->arch.fault_dear, exit_nr);
+		} else {
+			vcpu->arch.dear = vcpu->arch.fault_dear;
+			to_book3s(vcpu)->dsisr = vcpu->arch.fault_dsisr;
+			kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+			kvmppc_mmu_pte_flush(vcpu, vcpu->arch.dear, ~0xFFFULL);
+			r = RESUME_GUEST;
+		}
+		break;
+	case BOOK3S_INTERRUPT_DATA_SEGMENT:
+		if (kvmppc_mmu_map_segment(vcpu, vcpu->arch.fault_dear) < 0) {
+			vcpu->arch.dear = vcpu->arch.fault_dear;
+			kvmppc_book3s_queue_irqprio(vcpu,
+				BOOK3S_INTERRUPT_DATA_SEGMENT);
+		}
+		r = RESUME_GUEST;
+		break;
+	case BOOK3S_INTERRUPT_INST_SEGMENT:
+		if (kvmppc_mmu_map_segment(vcpu, vcpu->arch.pc) < 0) {
+			kvmppc_book3s_queue_irqprio(vcpu,
+				BOOK3S_INTERRUPT_INST_SEGMENT);
+		}
+		r = RESUME_GUEST;
+		break;
+	/* We're good on these - the host merely wanted to get our attention */
+	case BOOK3S_INTERRUPT_DECREMENTER:
+		vcpu->stat.dec_exits++;
+		r = RESUME_GUEST;
+		break;
+	case BOOK3S_INTERRUPT_EXTERNAL:
+		vcpu->stat.ext_intr_exits++;
+		r = RESUME_GUEST;
+		break;
+	case BOOK3S_INTERRUPT_PROGRAM:
+	{
+		enum emulation_result er;
+
+		if (vcpu->arch.msr & MSR_PR) {
+#ifdef EXIT_DEBUG
+			printk(KERN_INFO "Userspace triggered 0x700 exception at 0x%lx (0x%x)\n", vcpu->arch.pc, vcpu->arch.last_inst);
+#endif
+			if ((vcpu->arch.last_inst & 0xff0007ff) !=
+			    (INS_DCBZ & 0xfffffff7)) {
+				kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+				r = RESUME_GUEST;
+				break;
+			}
+		}
+
+		vcpu->stat.emulated_inst_exits++;
+		er = kvmppc_emulate_instruction(run, vcpu);
+		switch (er) {
+		case EMULATE_DONE:
+			r = RESUME_GUEST;
+			break;
+		case EMULATE_FAIL:
+			printk(KERN_CRIT "%s: emulation at %lx failed (%08x)\n",
+			       __func__, vcpu->arch.pc, vcpu->arch.last_inst);
+			kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+			r = RESUME_GUEST;
+			break;
+		default:
+			BUG();
+		}
+		break;
+	}
+	case BOOK3S_INTERRUPT_SYSCALL:
+#ifdef EXIT_DEBUG
+		printk(KERN_INFO "Syscall Nr %d\n", (int)vcpu->arch.gpr[0]);
+#endif
+		vcpu->stat.syscall_exits++;
+		kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+		r = RESUME_GUEST;
+		break;
+	case BOOK3S_INTERRUPT_MACHINE_CHECK:
+	case BOOK3S_INTERRUPT_FP_UNAVAIL:
+	case BOOK3S_INTERRUPT_TRACE:
+	case BOOK3S_INTERRUPT_ALTIVEC:
+	case BOOK3S_INTERRUPT_VSX:
+		kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+		r = RESUME_GUEST;
+		break;
+	default:
+		/* Ugh - bork here! What did we get? */
+		printk(KERN_EMERG "exit_nr=0x%x | pc=0x%lx | msr=0x%lx\n", exit_nr, vcpu->arch.pc, vcpu->arch.shadow_msr);
+		r = RESUME_HOST;
+		BUG();
+		break;
+	}
+
+
+	if (!(r & RESUME_HOST)) {
+		/* To avoid clobbering exit_reason, only check for signals if
+		 * we aren't already exiting to userspace for some other
+		 * reason. */
+		if (signal_pending(current)) {
+#ifdef EXIT_DEBUG
+			printk(KERN_EMERG "KVM: Going back to host\n");
+#endif
+			vcpu->stat.signal_exits++;
+			run->exit_reason = KVM_EXIT_INTR;
+			r = -EINTR;
+		} else {
+			/* In case an interrupt came in that was triggered
+			 * from userspace (like DEC), we need to check what
+			 * to inject now! */
+			kvmppc_core_deliver_interrupts(vcpu);
+		}
+	}
+
+#ifdef EXIT_DEBUG
+	printk(KERN_EMERG "KVM exit: vcpu=0x%p pc=0x%lx r=0x%x\n", vcpu, vcpu->arch.pc, r);
+#endif
+
+	return r;
+}
+
+int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	int i;
+
+	regs->pc = vcpu->arch.pc;
+	regs->cr = vcpu->arch.cr;
+	regs->ctr = vcpu->arch.ctr;
+	regs->lr = vcpu->arch.lr;
+	regs->xer = vcpu->arch.xer;
+	regs->msr = vcpu->arch.msr;
+	regs->srr0 = vcpu->arch.srr0;
+	regs->srr1 = vcpu->arch.srr1;
+	regs->pid = vcpu->arch.pid;
+	regs->sprg0 = vcpu->arch.sprg0;
+	regs->sprg1 = vcpu->arch.sprg1;
+	regs->sprg2 = vcpu->arch.sprg2;
+	regs->sprg3 = vcpu->arch.sprg3;
+	regs->sprg5 = vcpu->arch.sprg4;
+	regs->sprg6 = vcpu->arch.sprg5;
+	regs->sprg7 = vcpu->arch.sprg6;
+
+	for (i = 0; i < ARRAY_SIZE(regs->gpr); i++)
+		regs->gpr[i] = vcpu->arch.gpr[i];
+
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	int i;
+
+	vcpu->arch.pc = regs->pc;
+	vcpu->arch.cr = regs->cr;
+	vcpu->arch.ctr = regs->ctr;
+	vcpu->arch.lr = regs->lr;
+	vcpu->arch.xer = regs->xer;
+	kvmppc_set_msr(vcpu, regs->msr);
+	vcpu->arch.srr0 = regs->srr0;
+	vcpu->arch.srr1 = regs->srr1;
+	vcpu->arch.sprg0 = regs->sprg0;
+	vcpu->arch.sprg1 = regs->sprg1;
+	vcpu->arch.sprg2 = regs->sprg2;
+	vcpu->arch.sprg3 = regs->sprg3;
+	vcpu->arch.sprg5 = regs->sprg4;
+	vcpu->arch.sprg6 = regs->sprg5;
+	vcpu->arch.sprg7 = regs->sprg6;
+
+	for (i = 0; i < ARRAY_SIZE(vcpu->arch.gpr); i++)
+		vcpu->arch.gpr[i] = regs->gpr[i];
+
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+                                  struct kvm_sregs *sregs)
+{
+	sregs->pvr = vcpu->arch.pvr;
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+                                  struct kvm_sregs *sregs)
+{
+	kvmppc_set_pvr(vcpu, sregs->pvr);
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -ENOTSUPP;
+}
+
+int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -ENOTSUPP;
+}
+
+int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
+                                  struct kvm_translation *tr)
+{
+	return 0;
+}
+
+/*
+ * Get (and clear) the dirty memory log for a memory slot.
+ */
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
+				      struct kvm_dirty_log *log)
+{
+	struct kvm_memory_slot *memslot;
+	struct kvm_vcpu *vcpu;
+	ulong ga, ga_end;
+	int is_dirty = 0;
+	int r, n;
+
+	down_write(&kvm->slots_lock);
+
+	r = kvm_get_dirty_log(kvm, log, &is_dirty);
+	if (r)
+		goto out;
+
+	/* If nothing is dirty, don't bother messing with page tables. */
+	if (is_dirty) {
+		memslot = &kvm->memslots[log->slot];
+
+		ga = memslot->base_gfn << PAGE_SHIFT;
+		ga_end = ga + (memslot->npages << PAGE_SHIFT);
+
+		kvm_for_each_vcpu(n, vcpu, kvm)
+			kvmppc_mmu_pte_pflush(vcpu, ga, ga_end);
+
+		n = ALIGN(memslot->npages, BITS_PER_LONG) / 8;
+		memset(memslot->dirty_bitmap, 0, n);
+	}
+
+	r = 0;
+out:
+	up_write(&kvm->slots_lock);
+	return r;
+}
+
+int kvmppc_core_check_processor_compat(void)
+{
+	return 0;
+}
+
+struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s;
+	struct kvm_vcpu *vcpu;
+	int err;
+
+	vcpu_book3s = (struct kvmppc_vcpu_book3s *)__get_free_pages( GFP_KERNEL | __GFP_ZERO,
+			get_order(sizeof(struct kvmppc_vcpu_book3s)));
+	if (!vcpu_book3s) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	vcpu = &vcpu_book3s->vcpu;
+	err = kvm_vcpu_init(vcpu, kvm, id);
+	if (err)
+		goto free_vcpu;
+
+	vcpu->arch.host_retip = kvm_return_point;
+	vcpu->arch.host_msr = mfmsr();
+	/* default to book3s_64 (970fx) */
+	vcpu->arch.pvr = 0x3C0301;
+	kvmppc_set_pvr(vcpu, vcpu->arch.pvr);
+	vcpu_book3s->slb_nr = 64;
+
+	/* remember where some real-mode handlers are */
+	vcpu->arch.trampoline_lowmem = kvmppc_trampoline_lowmem;
+	vcpu->arch.trampoline_enter = kvmppc_trampoline_enter;
+	vcpu->arch.highmem_handler = (ulong)kvmppc_handler_highmem;
+
+	vcpu->arch.shadow_msr = MSR_USER64;
+
+	err = __init_new_context();
+	if (err < 0)
+		goto free_vcpu;
+	vcpu_book3s->context_id = err;
+
+	vcpu_book3s->vsid_max = ((vcpu_book3s->context_id + 1) << USER_ESID_BITS) - 1;
+	vcpu_book3s->vsid_first = vcpu_book3s->context_id << USER_ESID_BITS;
+	vcpu_book3s->vsid_next = vcpu_book3s->vsid_first;
+
+	return vcpu;
+
+free_vcpu:
+	free_pages((long)vcpu_book3s, get_order(sizeof(struct kvmppc_vcpu_book3s)));
+out:
+	return ERR_PTR(err);
+}
+
+void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+
+	__destroy_context(vcpu_book3s->context_id);
+	kvm_vcpu_uninit(vcpu);
+	free_pages((long)vcpu_book3s, get_order(sizeof(struct kvmppc_vcpu_book3s)));
+}
+
+extern int __kvmppc_vcpu_entry(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
+int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
+{
+	int ret;
+
+	/* No need to go into the guest when all we do is going out */
+	if (signal_pending(current)) {
+		kvm_run->exit_reason = KVM_EXIT_INTR;
+		return -EINTR;
+	}
+
+	/* XXX we get called with irq disabled - change that! */
+	local_irq_enable();
+
+	ret = __kvmppc_vcpu_entry(kvm_run, vcpu);
+
+	local_irq_disable();
+
+	return ret;
+}
+
+static int kvmppc_book3s_init(void)
+{
+	return kvm_init(NULL, sizeof(struct kvmppc_vcpu_book3s), THIS_MODULE);
+}
+
+static void kvmppc_book3s_exit(void)
+{
+	kvm_exit();
+}
+
+module_init(kvmppc_book3s_init);
+module_exit(kvmppc_book3s_exit);
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 10/27] Add book3s.c
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

This adds the book3s core handling file. Here everything that is generic to
desktop PowerPC cores is handled, including interrupt injections, MSR settings,
etc.

It basically takes over the same role as booke.c for embedded PowerPCs.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - use context_id instead of mm_alloc

v4 -> v5:

  - make pvr 32 bits

v5 -> v6:

  - use /* */ instead of //
  - use kvm_for_each_vcpu
---
 arch/powerpc/kvm/book3s.c |  925 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 925 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s.c

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
new file mode 100644
index 0000000..42037d4
--- /dev/null
+++ b/arch/powerpc/kvm/book3s.c
@@ -0,0 +1,925 @@
+/*
+ * Copyright (C) 2009. SUSE Linux Products GmbH. All rights reserved.
+ *
+ * Authors:
+ *    Alexander Graf <agraf@suse.de>
+ *    Kevin Wolf <mail@kevin-wolf.de>
+ *
+ * Description:
+ * This file is derived from arch/powerpc/kvm/44x.c,
+ * by Hollis Blanchard <hollisb@us.ibm.com>.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kvm_host.h>
+#include <linux/err.h>
+
+#include <asm/reg.h>
+#include <asm/cputable.h>
+#include <asm/cacheflush.h>
+#include <asm/tlbflush.h>
+#include <asm/uaccess.h>
+#include <asm/io.h>
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+#include <asm/mmu_context.h>
+#include <linux/sched.h>
+#include <linux/vmalloc.h>
+
+#define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
+
+/* #define EXIT_DEBUG */
+/* #define EXIT_DEBUG_SIMPLE */
+
+/* Without AGGRESSIVE_DEC we only fire off a DEC interrupt when DEC turns 0.
+ * When set, we retrigger a DEC interrupt after that if DEC <= 0.
+ * PPC32 Linux runs faster without AGGRESSIVE_DEC, PPC64 Linux requires it. */
+
+/* #define AGGRESSIVE_DEC */
+
+struct kvm_stats_debugfs_item debugfs_entries[] = {
+	{ "exits",       VCPU_STAT(sum_exits) },
+	{ "mmio",        VCPU_STAT(mmio_exits) },
+	{ "sig",         VCPU_STAT(signal_exits) },
+	{ "sysc",        VCPU_STAT(syscall_exits) },
+	{ "inst_emu",    VCPU_STAT(emulated_inst_exits) },
+	{ "dec",         VCPU_STAT(dec_exits) },
+	{ "ext_intr",    VCPU_STAT(ext_intr_exits) },
+	{ "queue_intr",  VCPU_STAT(queue_intr) },
+	{ "halt_wakeup", VCPU_STAT(halt_wakeup) },
+	{ "pf_storage",  VCPU_STAT(pf_storage) },
+	{ "sp_storage",  VCPU_STAT(sp_storage) },
+	{ "pf_instruc",  VCPU_STAT(pf_instruc) },
+	{ "sp_instruc",  VCPU_STAT(sp_instruc) },
+	{ "ld",          VCPU_STAT(ld) },
+	{ "ld_slow",     VCPU_STAT(ld_slow) },
+	{ "st",          VCPU_STAT(st) },
+	{ "st_slow",     VCPU_STAT(st_slow) },
+	{ NULL }
+};
+
+void kvmppc_core_load_host_debugstate(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvmppc_core_load_guest_debugstate(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+	memcpy(get_paca()->kvm_slb, to_book3s(vcpu)->slb_shadow, sizeof(get_paca()->kvm_slb));
+	get_paca()->kvm_slb_max = to_book3s(vcpu)->slb_shadow_max;
+}
+
+void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	memcpy(to_book3s(vcpu)->slb_shadow, get_paca()->kvm_slb, sizeof(get_paca()->kvm_slb));
+	to_book3s(vcpu)->slb_shadow_max = get_paca()->kvm_slb_max;
+}
+
+#if defined(AGGRESSIVE_DEC) || defined(EXIT_DEBUG)
+static u32 kvmppc_get_dec(struct kvm_vcpu *vcpu)
+{
+	u64 jd = mftb() - vcpu->arch.dec_jiffies;
+	return vcpu->arch.dec - jd;
+}
+#endif
+
+void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
+{
+	ulong old_msr = vcpu->arch.msr;
+
+#ifdef EXIT_DEBUG
+	printk(KERN_INFO "KVM: Set MSR to 0x%llx\n", msr);
+#endif
+	msr &= to_book3s(vcpu)->msr_mask;
+	vcpu->arch.msr = msr;
+	vcpu->arch.shadow_msr = msr | MSR_USER32;
+	vcpu->arch.shadow_msr &= ( MSR_VEC | MSR_VSX | MSR_FP | MSR_FE0 |
+				   MSR_USER64 | MSR_SE | MSR_BE | MSR_DE |
+				   MSR_FE1);
+
+	if (msr & (MSR_WE|MSR_POW)) {
+		if (!vcpu->arch.pending_exceptions) {
+			kvm_vcpu_block(vcpu);
+			vcpu->stat.halt_wakeup++;
+		}
+	}
+
+	if (((vcpu->arch.msr & (MSR_IR|MSR_DR)) != (old_msr & (MSR_IR|MSR_DR))) ||
+	    (vcpu->arch.msr & MSR_PR) != (old_msr & MSR_PR)) {
+		kvmppc_mmu_flush_segments(vcpu);
+		kvmppc_mmu_map_segment(vcpu, vcpu->arch.pc);
+	}
+}
+
+void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags)
+{
+	vcpu->arch.srr0 = vcpu->arch.pc;
+	vcpu->arch.srr1 = vcpu->arch.msr | flags;
+	vcpu->arch.pc = to_book3s(vcpu)->hior + vec;
+	vcpu->arch.mmu.reset_msr(vcpu);
+}
+
+void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec)
+{
+	unsigned int prio;
+
+	vcpu->stat.queue_intr++;
+	switch (vec) {
+	case 0x100: prio = BOOK3S_IRQPRIO_SYSTEM_RESET;		break;
+	case 0x200: prio = BOOK3S_IRQPRIO_MACHINE_CHECK;	break;
+	case 0x300: prio = BOOK3S_IRQPRIO_DATA_STORAGE;		break;
+	case 0x380: prio = BOOK3S_IRQPRIO_DATA_SEGMENT;		break;
+	case 0x400: prio = BOOK3S_IRQPRIO_INST_STORAGE;		break;
+	case 0x480: prio = BOOK3S_IRQPRIO_INST_SEGMENT;		break;
+	case 0x500: prio = BOOK3S_IRQPRIO_EXTERNAL;		break;
+	case 0x600: prio = BOOK3S_IRQPRIO_ALIGNMENT;		break;
+	case 0x700: prio = BOOK3S_IRQPRIO_PROGRAM;		break;
+	case 0x800: prio = BOOK3S_IRQPRIO_FP_UNAVAIL;		break;
+	case 0x900: prio = BOOK3S_IRQPRIO_DECREMENTER;		break;
+	case 0xc00: prio = BOOK3S_IRQPRIO_SYSCALL;		break;
+	case 0xd00: prio = BOOK3S_IRQPRIO_DEBUG;		break;
+	case 0xf20: prio = BOOK3S_IRQPRIO_ALTIVEC;		break;
+	case 0xf40: prio = BOOK3S_IRQPRIO_VSX;			break;
+	default:    prio = BOOK3S_IRQPRIO_MAX;			break;
+	}
+
+	set_bit(prio, &vcpu->arch.pending_exceptions);
+#ifdef EXIT_DEBUG
+	printk(KERN_INFO "Queueing interrupt %x\n", vec);
+#endif
+}
+
+
+void kvmppc_core_queue_program(struct kvm_vcpu *vcpu)
+{
+	kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_PROGRAM);
+}
+
+void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu)
+{
+	kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_DECREMENTER);
+}
+
+int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu)
+{
+	return test_bit(BOOK3S_INTERRUPT_DECREMENTER >> 7, &vcpu->arch.pending_exceptions);
+}
+
+void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
+                                struct kvm_interrupt *irq)
+{
+	kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_EXTERNAL);
+}
+
+int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority)
+{
+	int deliver = 1;
+	int vec = 0;
+
+	switch (priority) {
+	case BOOK3S_IRQPRIO_DECREMENTER:
+		deliver = vcpu->arch.msr & MSR_EE;
+		vec = BOOK3S_INTERRUPT_DECREMENTER;
+		break;
+	case BOOK3S_IRQPRIO_EXTERNAL:
+		deliver = vcpu->arch.msr & MSR_EE;
+		vec = BOOK3S_INTERRUPT_EXTERNAL;
+		break;
+	case BOOK3S_IRQPRIO_SYSTEM_RESET:
+		vec = BOOK3S_INTERRUPT_SYSTEM_RESET;
+		break;
+	case BOOK3S_IRQPRIO_MACHINE_CHECK:
+		vec = BOOK3S_INTERRUPT_MACHINE_CHECK;
+		break;
+	case BOOK3S_IRQPRIO_DATA_STORAGE:
+		vec = BOOK3S_INTERRUPT_DATA_STORAGE;
+		break;
+	case BOOK3S_IRQPRIO_INST_STORAGE:
+		vec = BOOK3S_INTERRUPT_INST_STORAGE;
+		break;
+	case BOOK3S_IRQPRIO_DATA_SEGMENT:
+		vec = BOOK3S_INTERRUPT_DATA_SEGMENT;
+		break;
+	case BOOK3S_IRQPRIO_INST_SEGMENT:
+		vec = BOOK3S_INTERRUPT_INST_SEGMENT;
+		break;
+	case BOOK3S_IRQPRIO_ALIGNMENT:
+		vec = BOOK3S_INTERRUPT_ALIGNMENT;
+		break;
+	case BOOK3S_IRQPRIO_PROGRAM:
+		vec = BOOK3S_INTERRUPT_PROGRAM;
+		break;
+	case BOOK3S_IRQPRIO_VSX:
+		vec = BOOK3S_INTERRUPT_VSX;
+		break;
+	case BOOK3S_IRQPRIO_ALTIVEC:
+		vec = BOOK3S_INTERRUPT_ALTIVEC;
+		break;
+	case BOOK3S_IRQPRIO_FP_UNAVAIL:
+		vec = BOOK3S_INTERRUPT_FP_UNAVAIL;
+		break;
+	case BOOK3S_IRQPRIO_SYSCALL:
+		vec = BOOK3S_INTERRUPT_SYSCALL;
+		break;
+	case BOOK3S_IRQPRIO_DEBUG:
+		vec = BOOK3S_INTERRUPT_TRACE;
+		break;
+	case BOOK3S_IRQPRIO_PERFORMANCE_MONITOR:
+		vec = BOOK3S_INTERRUPT_PERFMON;
+		break;
+	default:
+		deliver = 0;
+		printk(KERN_ERR "KVM: Unknown interrupt: 0x%x\n", priority);
+		break;
+	}
+
+#if 0
+	printk(KERN_INFO "Deliver interrupt 0x%x? %x\n", vec, deliver);
+#endif
+
+	if (deliver)
+		kvmppc_inject_interrupt(vcpu, vec, 0ULL);
+
+	return deliver;
+}
+
+void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
+{
+	unsigned long *pending = &vcpu->arch.pending_exceptions;
+	unsigned int priority;
+
+	/* XXX be more clever here - no need to mftb() on every entry */
+	/* Issue DEC again if it's still active */
+#ifdef AGGRESSIVE_DEC
+	if (vcpu->arch.msr & MSR_EE)
+		if (kvmppc_get_dec(vcpu) & 0x80000000)
+			kvmppc_core_queue_dec(vcpu);
+#endif
+
+#ifdef EXIT_DEBUG
+	if (vcpu->arch.pending_exceptions)
+		printk(KERN_EMERG "KVM: Check pending: %lx\n", vcpu->arch.pending_exceptions);
+#endif
+	priority = __ffs(*pending);
+	while (priority <= (sizeof(unsigned int) * 8)) {
+		if (kvmppc_book3s_irqprio_deliver(vcpu, priority)) {
+			clear_bit(priority, &vcpu->arch.pending_exceptions);
+			break;
+		}
+
+		priority = find_next_bit(pending,
+					 BITS_PER_BYTE * sizeof(*pending),
+					 priority + 1);
+	}
+}
+
+void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr)
+{
+	vcpu->arch.pvr = pvr;
+	if ((pvr >= 0x330000) && (pvr < 0x70330000)) {
+		kvmppc_mmu_book3s_64_init(vcpu);
+		to_book3s(vcpu)->hior = 0xfff00000;
+		to_book3s(vcpu)->msr_mask = 0xffffffffffffffffULL;
+	} else {
+		kvmppc_mmu_book3s_32_init(vcpu);
+		to_book3s(vcpu)->hior = 0;
+		to_book3s(vcpu)->msr_mask = 0xffffffffULL;
+	}
+
+	/* If we are in hypervisor level on 970, we can tell the CPU to
+	 * treat DCBZ as 32 bytes store */
+	vcpu->arch.hflags &= ~BOOK3S_HFLAG_DCBZ32;
+	if (vcpu->arch.mmu.is_dcbz32(vcpu) && (mfmsr() & MSR_HV) &&
+	    !strcmp(cur_cpu_spec->platform, "ppc970"))
+		vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
+
+}
+
+/* Book3s_32 CPUs always have 32 bytes cache line size, which Linux assumes. To
+ * make Book3s_32 Linux work on Book3s_64, we have to make sure we trap dcbz to
+ * emulate 32 bytes dcbz length.
+ *
+ * The Book3s_64 inventors also realized this case and implemented a special bit
+ * in the HID5 register, which is a hypervisor ressource. Thus we can't use it.
+ *
+ * My approach here is to patch the dcbz instruction on executing pages.
+ */
+static void kvmppc_patch_dcbz(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte)
+{
+	bool touched = false;
+	hva_t hpage;
+	u32 *page;
+	int i;
+
+	hpage = gfn_to_hva(vcpu->kvm, pte->raddr >> PAGE_SHIFT);
+	if (kvm_is_error_hva(hpage))
+		return;
+
+	hpage |= pte->raddr & ~PAGE_MASK;
+	hpage &= ~0xFFFULL;
+
+	page = vmalloc(HW_PAGE_SIZE);
+
+	if (copy_from_user(page, (void __user *)hpage, HW_PAGE_SIZE))
+		goto out;
+
+	for (i=0; i < HW_PAGE_SIZE / 4; i++)
+		if ((page[i] & 0xff0007ff) == INS_DCBZ) {
+			page[i] &= 0xfffffff7; // reserved instruction, so we trap
+			touched = true;
+		}
+
+	if (touched)
+		copy_to_user((void __user *)hpage, page, HW_PAGE_SIZE);
+
+out:
+	vfree(page);
+}
+
+static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, bool data,
+			 struct kvmppc_pte *pte)
+{
+	int relocated = (vcpu->arch.msr & (data ? MSR_DR : MSR_IR));
+	int r;
+
+	if (relocated) {
+		r = vcpu->arch.mmu.xlate(vcpu, eaddr, pte, data);
+	} else {
+		pte->eaddr = eaddr;
+		pte->raddr = eaddr & 0xffffffff;
+		pte->vpage = eaddr >> 12;
+		switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+		case 0:
+			pte->vpage |= VSID_REAL;
+		case MSR_DR:
+			pte->vpage |= VSID_REAL_DR;
+		case MSR_IR:
+			pte->vpage |= VSID_REAL_IR;
+		}
+		pte->may_read = true;
+		pte->may_write = true;
+		pte->may_execute = true;
+		r = 0;
+	}
+
+	return r;
+}
+
+static hva_t kvmppc_bad_hva(void)
+{
+	return PAGE_OFFSET;
+}
+
+static hva_t kvmppc_pte_to_hva(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte,
+			       bool read)
+{
+	hva_t hpage;
+
+	if (read && !pte->may_read)
+		goto err;
+
+	if (!read && !pte->may_write)
+		goto err;
+
+	hpage = gfn_to_hva(vcpu->kvm, pte->raddr >> PAGE_SHIFT);
+	if (kvm_is_error_hva(hpage))
+		goto err;
+
+	return hpage | (pte->raddr & ~PAGE_MASK);
+err:
+	return kvmppc_bad_hva();
+}
+
+int kvmppc_st(struct kvm_vcpu *vcpu, ulong eaddr, int size, void *ptr)
+{
+	struct kvmppc_pte pte;
+	hva_t hva = eaddr;
+
+	vcpu->stat.st++;
+
+	if (kvmppc_xlate(vcpu, eaddr, false, &pte))
+		goto err;
+
+	hva = kvmppc_pte_to_hva(vcpu, &pte, false);
+	if (kvm_is_error_hva(hva))
+		goto err;
+
+	if (copy_to_user((void __user *)hva, ptr, size)) {
+		printk(KERN_INFO "kvmppc_st at 0x%lx failed\n", hva);
+		goto err;
+	}
+
+	return 0;
+
+err:
+	return -ENOENT;
+}
+
+int kvmppc_ld(struct kvm_vcpu *vcpu, ulong eaddr, int size, void *ptr,
+		      bool data)
+{
+	struct kvmppc_pte pte;
+	hva_t hva = eaddr;
+
+	vcpu->stat.ld++;
+
+	if (kvmppc_xlate(vcpu, eaddr, data, &pte))
+		goto err;
+
+	hva = kvmppc_pte_to_hva(vcpu, &pte, true);
+	if (kvm_is_error_hva(hva))
+		goto err;
+
+	if (copy_from_user(ptr, (void __user *)hva, size)) {
+		printk(KERN_INFO "kvmppc_ld at 0x%lx failed\n", hva);
+		goto err;
+	}
+
+	return 0;
+
+err:
+	return -ENOENT;
+}
+
+static int kvmppc_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+	return kvm_is_visible_gfn(vcpu->kvm, gfn);
+}
+
+int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu,
+			    ulong eaddr, int vec)
+{
+	bool data = (vec == BOOK3S_INTERRUPT_DATA_STORAGE);
+	int r = RESUME_GUEST;
+	int relocated;
+	int page_found = 0;
+	struct kvmppc_pte pte;
+	bool is_mmio = false;
+
+	if ( vec == BOOK3S_INTERRUPT_DATA_STORAGE ) {
+		relocated = (vcpu->arch.msr & MSR_DR);
+	} else {
+		relocated = (vcpu->arch.msr & MSR_IR);
+	}
+
+	/* Resolve real address if translation turned on */
+	if (relocated) {
+		page_found = vcpu->arch.mmu.xlate(vcpu, eaddr, &pte, data);
+	} else {
+		pte.may_execute = true;
+		pte.may_read = true;
+		pte.may_write = true;
+		pte.raddr = eaddr & 0xffffffff;
+		pte.eaddr = eaddr;
+		pte.vpage = eaddr >> 12;
+		switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+		case 0:
+			pte.vpage |= VSID_REAL;
+		case MSR_DR:
+			pte.vpage |= VSID_REAL_DR;
+		case MSR_IR:
+			pte.vpage |= VSID_REAL_IR;
+		}
+	}
+
+	if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+	   (!(vcpu->arch.hflags & BOOK3S_HFLAG_DCBZ32))) {
+		/*
+		 * If we do the dcbz hack, we have to NX on every execution,
+		 * so we can patch the executing code. This renders our guest
+		 * NX-less.
+		 */
+		pte.may_execute = !data;
+	}
+
+	if (page_found == -ENOENT) {
+		/* Page not found in guest PTE entries */
+		vcpu->arch.dear = vcpu->arch.fault_dear;
+		to_book3s(vcpu)->dsisr = vcpu->arch.fault_dsisr;
+		vcpu->arch.msr |= (vcpu->arch.shadow_msr & 0x00000000f8000000ULL);
+		kvmppc_book3s_queue_irqprio(vcpu, vec);
+	} else if (page_found == -EPERM) {
+		/* Storage protection */
+		vcpu->arch.dear = vcpu->arch.fault_dear;
+		to_book3s(vcpu)->dsisr = vcpu->arch.fault_dsisr & ~DSISR_NOHPTE;
+		to_book3s(vcpu)->dsisr |= DSISR_PROTFAULT;
+		vcpu->arch.msr |= (vcpu->arch.shadow_msr & 0x00000000f8000000ULL);
+		kvmppc_book3s_queue_irqprio(vcpu, vec);
+	} else if (page_found == -EINVAL) {
+		/* Page not found in guest SLB */
+		vcpu->arch.dear = vcpu->arch.fault_dear;
+		kvmppc_book3s_queue_irqprio(vcpu, vec + 0x80);
+	} else if (!is_mmio &&
+		   kvmppc_visible_gfn(vcpu, pte.raddr >> PAGE_SHIFT)) {
+		/* The guest's PTE is not mapped yet. Map on the host */
+		kvmppc_mmu_map_page(vcpu, &pte);
+		if (data)
+			vcpu->stat.sp_storage++;
+		else if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+			(!(vcpu->arch.hflags & BOOK3S_HFLAG_DCBZ32)))
+			kvmppc_patch_dcbz(vcpu, &pte);
+	} else {
+		/* MMIO */
+		vcpu->stat.mmio_exits++;
+		vcpu->arch.paddr_accessed = pte.raddr;
+		r = kvmppc_emulate_mmio(run, vcpu);
+		if ( r == RESUME_HOST_NV )
+			r = RESUME_HOST;
+		if ( r == RESUME_GUEST_NV )
+			r = RESUME_GUEST;
+	}
+
+	return r;
+}
+
+int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
+                       unsigned int exit_nr)
+{
+	int r = RESUME_HOST;
+
+	vcpu->stat.sum_exits++;
+
+	run->exit_reason = KVM_EXIT_UNKNOWN;
+	run->ready_for_interrupt_injection = 1;
+#ifdef EXIT_DEBUG
+	printk(KERN_EMERG "exit_nr=0x%x | pc=0x%lx | dar=0x%lx | dec=0x%x | msr=0x%lx\n",
+		exit_nr, vcpu->arch.pc, vcpu->arch.fault_dear,
+		kvmppc_get_dec(vcpu), vcpu->arch.msr);
+#elif defined (EXIT_DEBUG_SIMPLE)
+	if ((exit_nr != 0x900) && (exit_nr != 0x500))
+		printk(KERN_EMERG "exit_nr=0x%x | pc=0x%lx | dar=0x%lx | msr=0x%lx\n",
+			exit_nr, vcpu->arch.pc, vcpu->arch.fault_dear,
+			vcpu->arch.msr);
+#endif
+	kvm_resched(vcpu);
+	switch (exit_nr) {
+	case BOOK3S_INTERRUPT_INST_STORAGE:
+		vcpu->stat.pf_instruc++;
+		/* only care about PTEG not found errors, but leave NX alone */
+		if (vcpu->arch.shadow_msr & 0x40000000) {
+			r = kvmppc_handle_pagefault(run, vcpu, vcpu->arch.pc, exit_nr);
+			vcpu->stat.sp_instruc++;
+		} else if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+			  (!(vcpu->arch.hflags & BOOK3S_HFLAG_DCBZ32))) {
+			/*
+			 * XXX If we do the dcbz hack we use the NX bit to flush&patch the page,
+			 *     so we can't use the NX bit inside the guest. Let's cross our fingers,
+			 *     that no guest that needs the dcbz hack does NX.
+			 */
+			kvmppc_mmu_pte_flush(vcpu, vcpu->arch.pc, ~0xFFFULL);
+		} else {
+			vcpu->arch.msr |= (vcpu->arch.shadow_msr & 0x58000000);
+			kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+			kvmppc_mmu_pte_flush(vcpu, vcpu->arch.pc, ~0xFFFULL);
+			r = RESUME_GUEST;
+		}
+		break;
+	case BOOK3S_INTERRUPT_DATA_STORAGE:
+		vcpu->stat.pf_storage++;
+		/* The only case we need to handle is missing shadow PTEs */
+		if (vcpu->arch.fault_dsisr & DSISR_NOHPTE) {
+			r = kvmppc_handle_pagefault(run, vcpu, vcpu->arch.fault_dear, exit_nr);
+		} else {
+			vcpu->arch.dear = vcpu->arch.fault_dear;
+			to_book3s(vcpu)->dsisr = vcpu->arch.fault_dsisr;
+			kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+			kvmppc_mmu_pte_flush(vcpu, vcpu->arch.dear, ~0xFFFULL);
+			r = RESUME_GUEST;
+		}
+		break;
+	case BOOK3S_INTERRUPT_DATA_SEGMENT:
+		if (kvmppc_mmu_map_segment(vcpu, vcpu->arch.fault_dear) < 0) {
+			vcpu->arch.dear = vcpu->arch.fault_dear;
+			kvmppc_book3s_queue_irqprio(vcpu,
+				BOOK3S_INTERRUPT_DATA_SEGMENT);
+		}
+		r = RESUME_GUEST;
+		break;
+	case BOOK3S_INTERRUPT_INST_SEGMENT:
+		if (kvmppc_mmu_map_segment(vcpu, vcpu->arch.pc) < 0) {
+			kvmppc_book3s_queue_irqprio(vcpu,
+				BOOK3S_INTERRUPT_INST_SEGMENT);
+		}
+		r = RESUME_GUEST;
+		break;
+	/* We're good on these - the host merely wanted to get our attention */
+	case BOOK3S_INTERRUPT_DECREMENTER:
+		vcpu->stat.dec_exits++;
+		r = RESUME_GUEST;
+		break;
+	case BOOK3S_INTERRUPT_EXTERNAL:
+		vcpu->stat.ext_intr_exits++;
+		r = RESUME_GUEST;
+		break;
+	case BOOK3S_INTERRUPT_PROGRAM:
+	{
+		enum emulation_result er;
+
+		if (vcpu->arch.msr & MSR_PR) {
+#ifdef EXIT_DEBUG
+			printk(KERN_INFO "Userspace triggered 0x700 exception at 0x%lx (0x%x)\n", vcpu->arch.pc, vcpu->arch.last_inst);
+#endif
+			if ((vcpu->arch.last_inst & 0xff0007ff) !=
+			    (INS_DCBZ & 0xfffffff7)) {
+				kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+				r = RESUME_GUEST;
+				break;
+			}
+		}
+
+		vcpu->stat.emulated_inst_exits++;
+		er = kvmppc_emulate_instruction(run, vcpu);
+		switch (er) {
+		case EMULATE_DONE:
+			r = RESUME_GUEST;
+			break;
+		case EMULATE_FAIL:
+			printk(KERN_CRIT "%s: emulation at %lx failed (%08x)\n",
+			       __func__, vcpu->arch.pc, vcpu->arch.last_inst);
+			kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+			r = RESUME_GUEST;
+			break;
+		default:
+			BUG();
+		}
+		break;
+	}
+	case BOOK3S_INTERRUPT_SYSCALL:
+#ifdef EXIT_DEBUG
+		printk(KERN_INFO "Syscall Nr %d\n", (int)vcpu->arch.gpr[0]);
+#endif
+		vcpu->stat.syscall_exits++;
+		kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+		r = RESUME_GUEST;
+		break;
+	case BOOK3S_INTERRUPT_MACHINE_CHECK:
+	case BOOK3S_INTERRUPT_FP_UNAVAIL:
+	case BOOK3S_INTERRUPT_TRACE:
+	case BOOK3S_INTERRUPT_ALTIVEC:
+	case BOOK3S_INTERRUPT_VSX:
+		kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+		r = RESUME_GUEST;
+		break;
+	default:
+		/* Ugh - bork here! What did we get? */
+		printk(KERN_EMERG "exit_nr=0x%x | pc=0x%lx | msr=0x%lx\n", exit_nr, vcpu->arch.pc, vcpu->arch.shadow_msr);
+		r = RESUME_HOST;
+		BUG();
+		break;
+	}
+
+
+	if (!(r & RESUME_HOST)) {
+		/* To avoid clobbering exit_reason, only check for signals if
+		 * we aren't already exiting to userspace for some other
+		 * reason. */
+		if (signal_pending(current)) {
+#ifdef EXIT_DEBUG
+			printk(KERN_EMERG "KVM: Going back to host\n");
+#endif
+			vcpu->stat.signal_exits++;
+			run->exit_reason = KVM_EXIT_INTR;
+			r = -EINTR;
+		} else {
+			/* In case an interrupt came in that was triggered
+			 * from userspace (like DEC), we need to check what
+			 * to inject now! */
+			kvmppc_core_deliver_interrupts(vcpu);
+		}
+	}
+
+#ifdef EXIT_DEBUG
+	printk(KERN_EMERG "KVM exit: vcpu=0x%p pc=0x%lx r=0x%x\n", vcpu, vcpu->arch.pc, r);
+#endif
+
+	return r;
+}
+
+int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	int i;
+
+	regs->pc = vcpu->arch.pc;
+	regs->cr = vcpu->arch.cr;
+	regs->ctr = vcpu->arch.ctr;
+	regs->lr = vcpu->arch.lr;
+	regs->xer = vcpu->arch.xer;
+	regs->msr = vcpu->arch.msr;
+	regs->srr0 = vcpu->arch.srr0;
+	regs->srr1 = vcpu->arch.srr1;
+	regs->pid = vcpu->arch.pid;
+	regs->sprg0 = vcpu->arch.sprg0;
+	regs->sprg1 = vcpu->arch.sprg1;
+	regs->sprg2 = vcpu->arch.sprg2;
+	regs->sprg3 = vcpu->arch.sprg3;
+	regs->sprg5 = vcpu->arch.sprg4;
+	regs->sprg6 = vcpu->arch.sprg5;
+	regs->sprg7 = vcpu->arch.sprg6;
+
+	for (i = 0; i < ARRAY_SIZE(regs->gpr); i++)
+		regs->gpr[i] = vcpu->arch.gpr[i];
+
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	int i;
+
+	vcpu->arch.pc = regs->pc;
+	vcpu->arch.cr = regs->cr;
+	vcpu->arch.ctr = regs->ctr;
+	vcpu->arch.lr = regs->lr;
+	vcpu->arch.xer = regs->xer;
+	kvmppc_set_msr(vcpu, regs->msr);
+	vcpu->arch.srr0 = regs->srr0;
+	vcpu->arch.srr1 = regs->srr1;
+	vcpu->arch.sprg0 = regs->sprg0;
+	vcpu->arch.sprg1 = regs->sprg1;
+	vcpu->arch.sprg2 = regs->sprg2;
+	vcpu->arch.sprg3 = regs->sprg3;
+	vcpu->arch.sprg5 = regs->sprg4;
+	vcpu->arch.sprg6 = regs->sprg5;
+	vcpu->arch.sprg7 = regs->sprg6;
+
+	for (i = 0; i < ARRAY_SIZE(vcpu->arch.gpr); i++)
+		vcpu->arch.gpr[i] = regs->gpr[i];
+
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+                                  struct kvm_sregs *sregs)
+{
+	sregs->pvr = vcpu->arch.pvr;
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+                                  struct kvm_sregs *sregs)
+{
+	kvmppc_set_pvr(vcpu, sregs->pvr);
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -ENOTSUPP;
+}
+
+int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -ENOTSUPP;
+}
+
+int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
+                                  struct kvm_translation *tr)
+{
+	return 0;
+}
+
+/*
+ * Get (and clear) the dirty memory log for a memory slot.
+ */
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
+				      struct kvm_dirty_log *log)
+{
+	struct kvm_memory_slot *memslot;
+	struct kvm_vcpu *vcpu;
+	ulong ga, ga_end;
+	int is_dirty = 0;
+	int r, n;
+
+	down_write(&kvm->slots_lock);
+
+	r = kvm_get_dirty_log(kvm, log, &is_dirty);
+	if (r)
+		goto out;
+
+	/* If nothing is dirty, don't bother messing with page tables. */
+	if (is_dirty) {
+		memslot = &kvm->memslots[log->slot];
+
+		ga = memslot->base_gfn << PAGE_SHIFT;
+		ga_end = ga + (memslot->npages << PAGE_SHIFT);
+
+		kvm_for_each_vcpu(n, vcpu, kvm)
+			kvmppc_mmu_pte_pflush(vcpu, ga, ga_end);
+
+		n = ALIGN(memslot->npages, BITS_PER_LONG) / 8;
+		memset(memslot->dirty_bitmap, 0, n);
+	}
+
+	r = 0;
+out:
+	up_write(&kvm->slots_lock);
+	return r;
+}
+
+int kvmppc_core_check_processor_compat(void)
+{
+	return 0;
+}
+
+struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s;
+	struct kvm_vcpu *vcpu;
+	int err;
+
+	vcpu_book3s = (struct kvmppc_vcpu_book3s *)__get_free_pages( GFP_KERNEL | __GFP_ZERO,
+			get_order(sizeof(struct kvmppc_vcpu_book3s)));
+	if (!vcpu_book3s) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	vcpu = &vcpu_book3s->vcpu;
+	err = kvm_vcpu_init(vcpu, kvm, id);
+	if (err)
+		goto free_vcpu;
+
+	vcpu->arch.host_retip = kvm_return_point;
+	vcpu->arch.host_msr = mfmsr();
+	/* default to book3s_64 (970fx) */
+	vcpu->arch.pvr = 0x3C0301;
+	kvmppc_set_pvr(vcpu, vcpu->arch.pvr);
+	vcpu_book3s->slb_nr = 64;
+
+	/* remember where some real-mode handlers are */
+	vcpu->arch.trampoline_lowmem = kvmppc_trampoline_lowmem;
+	vcpu->arch.trampoline_enter = kvmppc_trampoline_enter;
+	vcpu->arch.highmem_handler = (ulong)kvmppc_handler_highmem;
+
+	vcpu->arch.shadow_msr = MSR_USER64;
+
+	err = __init_new_context();
+	if (err < 0)
+		goto free_vcpu;
+	vcpu_book3s->context_id = err;
+
+	vcpu_book3s->vsid_max = ((vcpu_book3s->context_id + 1) << USER_ESID_BITS) - 1;
+	vcpu_book3s->vsid_first = vcpu_book3s->context_id << USER_ESID_BITS;
+	vcpu_book3s->vsid_next = vcpu_book3s->vsid_first;
+
+	return vcpu;
+
+free_vcpu:
+	free_pages((long)vcpu_book3s, get_order(sizeof(struct kvmppc_vcpu_book3s)));
+out:
+	return ERR_PTR(err);
+}
+
+void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+
+	__destroy_context(vcpu_book3s->context_id);
+	kvm_vcpu_uninit(vcpu);
+	free_pages((long)vcpu_book3s, get_order(sizeof(struct kvmppc_vcpu_book3s)));
+}
+
+extern int __kvmppc_vcpu_entry(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
+int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
+{
+	int ret;
+
+	/* No need to go into the guest when all we do is going out */
+	if (signal_pending(current)) {
+		kvm_run->exit_reason = KVM_EXIT_INTR;
+		return -EINTR;
+	}
+
+	/* XXX we get called with irq disabled - change that! */
+	local_irq_enable();
+
+	ret = __kvmppc_vcpu_entry(kvm_run, vcpu);
+
+	local_irq_disable();
+
+	return ret;
+}
+
+static int kvmppc_book3s_init(void)
+{
+	return kvm_init(NULL, sizeof(struct kvmppc_vcpu_book3s), THIS_MODULE);
+}
+
+static void kvmppc_book3s_exit(void)
+{
+	kvm_exit();
+}
+
+module_init(kvmppc_book3s_init);
+module_exit(kvmppc_book3s_exit);
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 10/27] Add book3s.c
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

This adds the book3s core handling file. Here everything that is generic to
desktop PowerPC cores is handled, including interrupt injections, MSR settings,
etc.

It basically takes over the same role as booke.c for embedded PowerPCs.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - use context_id instead of mm_alloc

v4 -> v5:

  - make pvr 32 bits

v5 -> v6:

  - use /* */ instead of //
  - use kvm_for_each_vcpu
---
 arch/powerpc/kvm/book3s.c |  925 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 925 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s.c

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
new file mode 100644
index 0000000..42037d4
--- /dev/null
+++ b/arch/powerpc/kvm/book3s.c
@@ -0,0 +1,925 @@
+/*
+ * Copyright (C) 2009. SUSE Linux Products GmbH. All rights reserved.
+ *
+ * Authors:
+ *    Alexander Graf <agraf@suse.de>
+ *    Kevin Wolf <mail@kevin-wolf.de>
+ *
+ * Description:
+ * This file is derived from arch/powerpc/kvm/44x.c,
+ * by Hollis Blanchard <hollisb@us.ibm.com>.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kvm_host.h>
+#include <linux/err.h>
+
+#include <asm/reg.h>
+#include <asm/cputable.h>
+#include <asm/cacheflush.h>
+#include <asm/tlbflush.h>
+#include <asm/uaccess.h>
+#include <asm/io.h>
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+#include <asm/mmu_context.h>
+#include <linux/sched.h>
+#include <linux/vmalloc.h>
+
+#define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
+
+/* #define EXIT_DEBUG */
+/* #define EXIT_DEBUG_SIMPLE */
+
+/* Without AGGRESSIVE_DEC we only fire off a DEC interrupt when DEC turns 0.
+ * When set, we retrigger a DEC interrupt after that if DEC <= 0.
+ * PPC32 Linux runs faster without AGGRESSIVE_DEC, PPC64 Linux requires it. */
+
+/* #define AGGRESSIVE_DEC */
+
+struct kvm_stats_debugfs_item debugfs_entries[] = {
+	{ "exits",       VCPU_STAT(sum_exits) },
+	{ "mmio",        VCPU_STAT(mmio_exits) },
+	{ "sig",         VCPU_STAT(signal_exits) },
+	{ "sysc",        VCPU_STAT(syscall_exits) },
+	{ "inst_emu",    VCPU_STAT(emulated_inst_exits) },
+	{ "dec",         VCPU_STAT(dec_exits) },
+	{ "ext_intr",    VCPU_STAT(ext_intr_exits) },
+	{ "queue_intr",  VCPU_STAT(queue_intr) },
+	{ "halt_wakeup", VCPU_STAT(halt_wakeup) },
+	{ "pf_storage",  VCPU_STAT(pf_storage) },
+	{ "sp_storage",  VCPU_STAT(sp_storage) },
+	{ "pf_instruc",  VCPU_STAT(pf_instruc) },
+	{ "sp_instruc",  VCPU_STAT(sp_instruc) },
+	{ "ld",          VCPU_STAT(ld) },
+	{ "ld_slow",     VCPU_STAT(ld_slow) },
+	{ "st",          VCPU_STAT(st) },
+	{ "st_slow",     VCPU_STAT(st_slow) },
+	{ NULL }
+};
+
+void kvmppc_core_load_host_debugstate(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvmppc_core_load_guest_debugstate(struct kvm_vcpu *vcpu)
+{
+}
+
+void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+	memcpy(get_paca()->kvm_slb, to_book3s(vcpu)->slb_shadow, sizeof(get_paca()->kvm_slb));
+	get_paca()->kvm_slb_max = to_book3s(vcpu)->slb_shadow_max;
+}
+
+void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	memcpy(to_book3s(vcpu)->slb_shadow, get_paca()->kvm_slb, sizeof(get_paca()->kvm_slb));
+	to_book3s(vcpu)->slb_shadow_max = get_paca()->kvm_slb_max;
+}
+
+#if defined(AGGRESSIVE_DEC) || defined(EXIT_DEBUG)
+static u32 kvmppc_get_dec(struct kvm_vcpu *vcpu)
+{
+	u64 jd = mftb() - vcpu->arch.dec_jiffies;
+	return vcpu->arch.dec - jd;
+}
+#endif
+
+void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
+{
+	ulong old_msr = vcpu->arch.msr;
+
+#ifdef EXIT_DEBUG
+	printk(KERN_INFO "KVM: Set MSR to 0x%llx\n", msr);
+#endif
+	msr &= to_book3s(vcpu)->msr_mask;
+	vcpu->arch.msr = msr;
+	vcpu->arch.shadow_msr = msr | MSR_USER32;
+	vcpu->arch.shadow_msr &= ( MSR_VEC | MSR_VSX | MSR_FP | MSR_FE0 |
+				   MSR_USER64 | MSR_SE | MSR_BE | MSR_DE |
+				   MSR_FE1);
+
+	if (msr & (MSR_WE|MSR_POW)) {
+		if (!vcpu->arch.pending_exceptions) {
+			kvm_vcpu_block(vcpu);
+			vcpu->stat.halt_wakeup++;
+		}
+	}
+
+	if (((vcpu->arch.msr & (MSR_IR|MSR_DR)) != (old_msr & (MSR_IR|MSR_DR))) ||
+	    (vcpu->arch.msr & MSR_PR) != (old_msr & MSR_PR)) {
+		kvmppc_mmu_flush_segments(vcpu);
+		kvmppc_mmu_map_segment(vcpu, vcpu->arch.pc);
+	}
+}
+
+void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags)
+{
+	vcpu->arch.srr0 = vcpu->arch.pc;
+	vcpu->arch.srr1 = vcpu->arch.msr | flags;
+	vcpu->arch.pc = to_book3s(vcpu)->hior + vec;
+	vcpu->arch.mmu.reset_msr(vcpu);
+}
+
+void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec)
+{
+	unsigned int prio;
+
+	vcpu->stat.queue_intr++;
+	switch (vec) {
+	case 0x100: prio = BOOK3S_IRQPRIO_SYSTEM_RESET;		break;
+	case 0x200: prio = BOOK3S_IRQPRIO_MACHINE_CHECK;	break;
+	case 0x300: prio = BOOK3S_IRQPRIO_DATA_STORAGE;		break;
+	case 0x380: prio = BOOK3S_IRQPRIO_DATA_SEGMENT;		break;
+	case 0x400: prio = BOOK3S_IRQPRIO_INST_STORAGE;		break;
+	case 0x480: prio = BOOK3S_IRQPRIO_INST_SEGMENT;		break;
+	case 0x500: prio = BOOK3S_IRQPRIO_EXTERNAL;		break;
+	case 0x600: prio = BOOK3S_IRQPRIO_ALIGNMENT;		break;
+	case 0x700: prio = BOOK3S_IRQPRIO_PROGRAM;		break;
+	case 0x800: prio = BOOK3S_IRQPRIO_FP_UNAVAIL;		break;
+	case 0x900: prio = BOOK3S_IRQPRIO_DECREMENTER;		break;
+	case 0xc00: prio = BOOK3S_IRQPRIO_SYSCALL;		break;
+	case 0xd00: prio = BOOK3S_IRQPRIO_DEBUG;		break;
+	case 0xf20: prio = BOOK3S_IRQPRIO_ALTIVEC;		break;
+	case 0xf40: prio = BOOK3S_IRQPRIO_VSX;			break;
+	default:    prio = BOOK3S_IRQPRIO_MAX;			break;
+	}
+
+	set_bit(prio, &vcpu->arch.pending_exceptions);
+#ifdef EXIT_DEBUG
+	printk(KERN_INFO "Queueing interrupt %x\n", vec);
+#endif
+}
+
+
+void kvmppc_core_queue_program(struct kvm_vcpu *vcpu)
+{
+	kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_PROGRAM);
+}
+
+void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu)
+{
+	kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_DECREMENTER);
+}
+
+int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu)
+{
+	return test_bit(BOOK3S_INTERRUPT_DECREMENTER >> 7, &vcpu->arch.pending_exceptions);
+}
+
+void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
+                                struct kvm_interrupt *irq)
+{
+	kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_EXTERNAL);
+}
+
+int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority)
+{
+	int deliver = 1;
+	int vec = 0;
+
+	switch (priority) {
+	case BOOK3S_IRQPRIO_DECREMENTER:
+		deliver = vcpu->arch.msr & MSR_EE;
+		vec = BOOK3S_INTERRUPT_DECREMENTER;
+		break;
+	case BOOK3S_IRQPRIO_EXTERNAL:
+		deliver = vcpu->arch.msr & MSR_EE;
+		vec = BOOK3S_INTERRUPT_EXTERNAL;
+		break;
+	case BOOK3S_IRQPRIO_SYSTEM_RESET:
+		vec = BOOK3S_INTERRUPT_SYSTEM_RESET;
+		break;
+	case BOOK3S_IRQPRIO_MACHINE_CHECK:
+		vec = BOOK3S_INTERRUPT_MACHINE_CHECK;
+		break;
+	case BOOK3S_IRQPRIO_DATA_STORAGE:
+		vec = BOOK3S_INTERRUPT_DATA_STORAGE;
+		break;
+	case BOOK3S_IRQPRIO_INST_STORAGE:
+		vec = BOOK3S_INTERRUPT_INST_STORAGE;
+		break;
+	case BOOK3S_IRQPRIO_DATA_SEGMENT:
+		vec = BOOK3S_INTERRUPT_DATA_SEGMENT;
+		break;
+	case BOOK3S_IRQPRIO_INST_SEGMENT:
+		vec = BOOK3S_INTERRUPT_INST_SEGMENT;
+		break;
+	case BOOK3S_IRQPRIO_ALIGNMENT:
+		vec = BOOK3S_INTERRUPT_ALIGNMENT;
+		break;
+	case BOOK3S_IRQPRIO_PROGRAM:
+		vec = BOOK3S_INTERRUPT_PROGRAM;
+		break;
+	case BOOK3S_IRQPRIO_VSX:
+		vec = BOOK3S_INTERRUPT_VSX;
+		break;
+	case BOOK3S_IRQPRIO_ALTIVEC:
+		vec = BOOK3S_INTERRUPT_ALTIVEC;
+		break;
+	case BOOK3S_IRQPRIO_FP_UNAVAIL:
+		vec = BOOK3S_INTERRUPT_FP_UNAVAIL;
+		break;
+	case BOOK3S_IRQPRIO_SYSCALL:
+		vec = BOOK3S_INTERRUPT_SYSCALL;
+		break;
+	case BOOK3S_IRQPRIO_DEBUG:
+		vec = BOOK3S_INTERRUPT_TRACE;
+		break;
+	case BOOK3S_IRQPRIO_PERFORMANCE_MONITOR:
+		vec = BOOK3S_INTERRUPT_PERFMON;
+		break;
+	default:
+		deliver = 0;
+		printk(KERN_ERR "KVM: Unknown interrupt: 0x%x\n", priority);
+		break;
+	}
+
+#if 0
+	printk(KERN_INFO "Deliver interrupt 0x%x? %x\n", vec, deliver);
+#endif
+
+	if (deliver)
+		kvmppc_inject_interrupt(vcpu, vec, 0ULL);
+
+	return deliver;
+}
+
+void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
+{
+	unsigned long *pending = &vcpu->arch.pending_exceptions;
+	unsigned int priority;
+
+	/* XXX be more clever here - no need to mftb() on every entry */
+	/* Issue DEC again if it's still active */
+#ifdef AGGRESSIVE_DEC
+	if (vcpu->arch.msr & MSR_EE)
+		if (kvmppc_get_dec(vcpu) & 0x80000000)
+			kvmppc_core_queue_dec(vcpu);
+#endif
+
+#ifdef EXIT_DEBUG
+	if (vcpu->arch.pending_exceptions)
+		printk(KERN_EMERG "KVM: Check pending: %lx\n", vcpu->arch.pending_exceptions);
+#endif
+	priority = __ffs(*pending);
+	while (priority <= (sizeof(unsigned int) * 8)) {
+		if (kvmppc_book3s_irqprio_deliver(vcpu, priority)) {
+			clear_bit(priority, &vcpu->arch.pending_exceptions);
+			break;
+		}
+
+		priority = find_next_bit(pending,
+					 BITS_PER_BYTE * sizeof(*pending),
+					 priority + 1);
+	}
+}
+
+void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr)
+{
+	vcpu->arch.pvr = pvr;
+	if ((pvr >= 0x330000) && (pvr < 0x70330000)) {
+		kvmppc_mmu_book3s_64_init(vcpu);
+		to_book3s(vcpu)->hior = 0xfff00000;
+		to_book3s(vcpu)->msr_mask = 0xffffffffffffffffULL;
+	} else {
+		kvmppc_mmu_book3s_32_init(vcpu);
+		to_book3s(vcpu)->hior = 0;
+		to_book3s(vcpu)->msr_mask = 0xffffffffULL;
+	}
+
+	/* If we are in hypervisor level on 970, we can tell the CPU to
+	 * treat DCBZ as 32 bytes store */
+	vcpu->arch.hflags &= ~BOOK3S_HFLAG_DCBZ32;
+	if (vcpu->arch.mmu.is_dcbz32(vcpu) && (mfmsr() & MSR_HV) &&
+	    !strcmp(cur_cpu_spec->platform, "ppc970"))
+		vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
+
+}
+
+/* Book3s_32 CPUs always have 32 bytes cache line size, which Linux assumes. To
+ * make Book3s_32 Linux work on Book3s_64, we have to make sure we trap dcbz to
+ * emulate 32 bytes dcbz length.
+ *
+ * The Book3s_64 inventors also realized this case and implemented a special bit
+ * in the HID5 register, which is a hypervisor ressource. Thus we can't use it.
+ *
+ * My approach here is to patch the dcbz instruction on executing pages.
+ */
+static void kvmppc_patch_dcbz(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte)
+{
+	bool touched = false;
+	hva_t hpage;
+	u32 *page;
+	int i;
+
+	hpage = gfn_to_hva(vcpu->kvm, pte->raddr >> PAGE_SHIFT);
+	if (kvm_is_error_hva(hpage))
+		return;
+
+	hpage |= pte->raddr & ~PAGE_MASK;
+	hpage &= ~0xFFFULL;
+
+	page = vmalloc(HW_PAGE_SIZE);
+
+	if (copy_from_user(page, (void __user *)hpage, HW_PAGE_SIZE))
+		goto out;
+
+	for (i=0; i < HW_PAGE_SIZE / 4; i++)
+		if ((page[i] & 0xff0007ff) = INS_DCBZ) {
+			page[i] &= 0xfffffff7; // reserved instruction, so we trap
+			touched = true;
+		}
+
+	if (touched)
+		copy_to_user((void __user *)hpage, page, HW_PAGE_SIZE);
+
+out:
+	vfree(page);
+}
+
+static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, bool data,
+			 struct kvmppc_pte *pte)
+{
+	int relocated = (vcpu->arch.msr & (data ? MSR_DR : MSR_IR));
+	int r;
+
+	if (relocated) {
+		r = vcpu->arch.mmu.xlate(vcpu, eaddr, pte, data);
+	} else {
+		pte->eaddr = eaddr;
+		pte->raddr = eaddr & 0xffffffff;
+		pte->vpage = eaddr >> 12;
+		switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+		case 0:
+			pte->vpage |= VSID_REAL;
+		case MSR_DR:
+			pte->vpage |= VSID_REAL_DR;
+		case MSR_IR:
+			pte->vpage |= VSID_REAL_IR;
+		}
+		pte->may_read = true;
+		pte->may_write = true;
+		pte->may_execute = true;
+		r = 0;
+	}
+
+	return r;
+}
+
+static hva_t kvmppc_bad_hva(void)
+{
+	return PAGE_OFFSET;
+}
+
+static hva_t kvmppc_pte_to_hva(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte,
+			       bool read)
+{
+	hva_t hpage;
+
+	if (read && !pte->may_read)
+		goto err;
+
+	if (!read && !pte->may_write)
+		goto err;
+
+	hpage = gfn_to_hva(vcpu->kvm, pte->raddr >> PAGE_SHIFT);
+	if (kvm_is_error_hva(hpage))
+		goto err;
+
+	return hpage | (pte->raddr & ~PAGE_MASK);
+err:
+	return kvmppc_bad_hva();
+}
+
+int kvmppc_st(struct kvm_vcpu *vcpu, ulong eaddr, int size, void *ptr)
+{
+	struct kvmppc_pte pte;
+	hva_t hva = eaddr;
+
+	vcpu->stat.st++;
+
+	if (kvmppc_xlate(vcpu, eaddr, false, &pte))
+		goto err;
+
+	hva = kvmppc_pte_to_hva(vcpu, &pte, false);
+	if (kvm_is_error_hva(hva))
+		goto err;
+
+	if (copy_to_user((void __user *)hva, ptr, size)) {
+		printk(KERN_INFO "kvmppc_st at 0x%lx failed\n", hva);
+		goto err;
+	}
+
+	return 0;
+
+err:
+	return -ENOENT;
+}
+
+int kvmppc_ld(struct kvm_vcpu *vcpu, ulong eaddr, int size, void *ptr,
+		      bool data)
+{
+	struct kvmppc_pte pte;
+	hva_t hva = eaddr;
+
+	vcpu->stat.ld++;
+
+	if (kvmppc_xlate(vcpu, eaddr, data, &pte))
+		goto err;
+
+	hva = kvmppc_pte_to_hva(vcpu, &pte, true);
+	if (kvm_is_error_hva(hva))
+		goto err;
+
+	if (copy_from_user(ptr, (void __user *)hva, size)) {
+		printk(KERN_INFO "kvmppc_ld at 0x%lx failed\n", hva);
+		goto err;
+	}
+
+	return 0;
+
+err:
+	return -ENOENT;
+}
+
+static int kvmppc_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+	return kvm_is_visible_gfn(vcpu->kvm, gfn);
+}
+
+int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu,
+			    ulong eaddr, int vec)
+{
+	bool data = (vec = BOOK3S_INTERRUPT_DATA_STORAGE);
+	int r = RESUME_GUEST;
+	int relocated;
+	int page_found = 0;
+	struct kvmppc_pte pte;
+	bool is_mmio = false;
+
+	if ( vec = BOOK3S_INTERRUPT_DATA_STORAGE ) {
+		relocated = (vcpu->arch.msr & MSR_DR);
+	} else {
+		relocated = (vcpu->arch.msr & MSR_IR);
+	}
+
+	/* Resolve real address if translation turned on */
+	if (relocated) {
+		page_found = vcpu->arch.mmu.xlate(vcpu, eaddr, &pte, data);
+	} else {
+		pte.may_execute = true;
+		pte.may_read = true;
+		pte.may_write = true;
+		pte.raddr = eaddr & 0xffffffff;
+		pte.eaddr = eaddr;
+		pte.vpage = eaddr >> 12;
+		switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+		case 0:
+			pte.vpage |= VSID_REAL;
+		case MSR_DR:
+			pte.vpage |= VSID_REAL_DR;
+		case MSR_IR:
+			pte.vpage |= VSID_REAL_IR;
+		}
+	}
+
+	if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+	   (!(vcpu->arch.hflags & BOOK3S_HFLAG_DCBZ32))) {
+		/*
+		 * If we do the dcbz hack, we have to NX on every execution,
+		 * so we can patch the executing code. This renders our guest
+		 * NX-less.
+		 */
+		pte.may_execute = !data;
+	}
+
+	if (page_found = -ENOENT) {
+		/* Page not found in guest PTE entries */
+		vcpu->arch.dear = vcpu->arch.fault_dear;
+		to_book3s(vcpu)->dsisr = vcpu->arch.fault_dsisr;
+		vcpu->arch.msr |= (vcpu->arch.shadow_msr & 0x00000000f8000000ULL);
+		kvmppc_book3s_queue_irqprio(vcpu, vec);
+	} else if (page_found = -EPERM) {
+		/* Storage protection */
+		vcpu->arch.dear = vcpu->arch.fault_dear;
+		to_book3s(vcpu)->dsisr = vcpu->arch.fault_dsisr & ~DSISR_NOHPTE;
+		to_book3s(vcpu)->dsisr |= DSISR_PROTFAULT;
+		vcpu->arch.msr |= (vcpu->arch.shadow_msr & 0x00000000f8000000ULL);
+		kvmppc_book3s_queue_irqprio(vcpu, vec);
+	} else if (page_found = -EINVAL) {
+		/* Page not found in guest SLB */
+		vcpu->arch.dear = vcpu->arch.fault_dear;
+		kvmppc_book3s_queue_irqprio(vcpu, vec + 0x80);
+	} else if (!is_mmio &&
+		   kvmppc_visible_gfn(vcpu, pte.raddr >> PAGE_SHIFT)) {
+		/* The guest's PTE is not mapped yet. Map on the host */
+		kvmppc_mmu_map_page(vcpu, &pte);
+		if (data)
+			vcpu->stat.sp_storage++;
+		else if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+			(!(vcpu->arch.hflags & BOOK3S_HFLAG_DCBZ32)))
+			kvmppc_patch_dcbz(vcpu, &pte);
+	} else {
+		/* MMIO */
+		vcpu->stat.mmio_exits++;
+		vcpu->arch.paddr_accessed = pte.raddr;
+		r = kvmppc_emulate_mmio(run, vcpu);
+		if ( r = RESUME_HOST_NV )
+			r = RESUME_HOST;
+		if ( r = RESUME_GUEST_NV )
+			r = RESUME_GUEST;
+	}
+
+	return r;
+}
+
+int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
+                       unsigned int exit_nr)
+{
+	int r = RESUME_HOST;
+
+	vcpu->stat.sum_exits++;
+
+	run->exit_reason = KVM_EXIT_UNKNOWN;
+	run->ready_for_interrupt_injection = 1;
+#ifdef EXIT_DEBUG
+	printk(KERN_EMERG "exit_nr=0x%x | pc=0x%lx | dar=0x%lx | dec=0x%x | msr=0x%lx\n",
+		exit_nr, vcpu->arch.pc, vcpu->arch.fault_dear,
+		kvmppc_get_dec(vcpu), vcpu->arch.msr);
+#elif defined (EXIT_DEBUG_SIMPLE)
+	if ((exit_nr != 0x900) && (exit_nr != 0x500))
+		printk(KERN_EMERG "exit_nr=0x%x | pc=0x%lx | dar=0x%lx | msr=0x%lx\n",
+			exit_nr, vcpu->arch.pc, vcpu->arch.fault_dear,
+			vcpu->arch.msr);
+#endif
+	kvm_resched(vcpu);
+	switch (exit_nr) {
+	case BOOK3S_INTERRUPT_INST_STORAGE:
+		vcpu->stat.pf_instruc++;
+		/* only care about PTEG not found errors, but leave NX alone */
+		if (vcpu->arch.shadow_msr & 0x40000000) {
+			r = kvmppc_handle_pagefault(run, vcpu, vcpu->arch.pc, exit_nr);
+			vcpu->stat.sp_instruc++;
+		} else if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+			  (!(vcpu->arch.hflags & BOOK3S_HFLAG_DCBZ32))) {
+			/*
+			 * XXX If we do the dcbz hack we use the NX bit to flush&patch the page,
+			 *     so we can't use the NX bit inside the guest. Let's cross our fingers,
+			 *     that no guest that needs the dcbz hack does NX.
+			 */
+			kvmppc_mmu_pte_flush(vcpu, vcpu->arch.pc, ~0xFFFULL);
+		} else {
+			vcpu->arch.msr |= (vcpu->arch.shadow_msr & 0x58000000);
+			kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+			kvmppc_mmu_pte_flush(vcpu, vcpu->arch.pc, ~0xFFFULL);
+			r = RESUME_GUEST;
+		}
+		break;
+	case BOOK3S_INTERRUPT_DATA_STORAGE:
+		vcpu->stat.pf_storage++;
+		/* The only case we need to handle is missing shadow PTEs */
+		if (vcpu->arch.fault_dsisr & DSISR_NOHPTE) {
+			r = kvmppc_handle_pagefault(run, vcpu, vcpu->arch.fault_dear, exit_nr);
+		} else {
+			vcpu->arch.dear = vcpu->arch.fault_dear;
+			to_book3s(vcpu)->dsisr = vcpu->arch.fault_dsisr;
+			kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+			kvmppc_mmu_pte_flush(vcpu, vcpu->arch.dear, ~0xFFFULL);
+			r = RESUME_GUEST;
+		}
+		break;
+	case BOOK3S_INTERRUPT_DATA_SEGMENT:
+		if (kvmppc_mmu_map_segment(vcpu, vcpu->arch.fault_dear) < 0) {
+			vcpu->arch.dear = vcpu->arch.fault_dear;
+			kvmppc_book3s_queue_irqprio(vcpu,
+				BOOK3S_INTERRUPT_DATA_SEGMENT);
+		}
+		r = RESUME_GUEST;
+		break;
+	case BOOK3S_INTERRUPT_INST_SEGMENT:
+		if (kvmppc_mmu_map_segment(vcpu, vcpu->arch.pc) < 0) {
+			kvmppc_book3s_queue_irqprio(vcpu,
+				BOOK3S_INTERRUPT_INST_SEGMENT);
+		}
+		r = RESUME_GUEST;
+		break;
+	/* We're good on these - the host merely wanted to get our attention */
+	case BOOK3S_INTERRUPT_DECREMENTER:
+		vcpu->stat.dec_exits++;
+		r = RESUME_GUEST;
+		break;
+	case BOOK3S_INTERRUPT_EXTERNAL:
+		vcpu->stat.ext_intr_exits++;
+		r = RESUME_GUEST;
+		break;
+	case BOOK3S_INTERRUPT_PROGRAM:
+	{
+		enum emulation_result er;
+
+		if (vcpu->arch.msr & MSR_PR) {
+#ifdef EXIT_DEBUG
+			printk(KERN_INFO "Userspace triggered 0x700 exception at 0x%lx (0x%x)\n", vcpu->arch.pc, vcpu->arch.last_inst);
+#endif
+			if ((vcpu->arch.last_inst & 0xff0007ff) !+			    (INS_DCBZ & 0xfffffff7)) {
+				kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+				r = RESUME_GUEST;
+				break;
+			}
+		}
+
+		vcpu->stat.emulated_inst_exits++;
+		er = kvmppc_emulate_instruction(run, vcpu);
+		switch (er) {
+		case EMULATE_DONE:
+			r = RESUME_GUEST;
+			break;
+		case EMULATE_FAIL:
+			printk(KERN_CRIT "%s: emulation at %lx failed (%08x)\n",
+			       __func__, vcpu->arch.pc, vcpu->arch.last_inst);
+			kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+			r = RESUME_GUEST;
+			break;
+		default:
+			BUG();
+		}
+		break;
+	}
+	case BOOK3S_INTERRUPT_SYSCALL:
+#ifdef EXIT_DEBUG
+		printk(KERN_INFO "Syscall Nr %d\n", (int)vcpu->arch.gpr[0]);
+#endif
+		vcpu->stat.syscall_exits++;
+		kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+		r = RESUME_GUEST;
+		break;
+	case BOOK3S_INTERRUPT_MACHINE_CHECK:
+	case BOOK3S_INTERRUPT_FP_UNAVAIL:
+	case BOOK3S_INTERRUPT_TRACE:
+	case BOOK3S_INTERRUPT_ALTIVEC:
+	case BOOK3S_INTERRUPT_VSX:
+		kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+		r = RESUME_GUEST;
+		break;
+	default:
+		/* Ugh - bork here! What did we get? */
+		printk(KERN_EMERG "exit_nr=0x%x | pc=0x%lx | msr=0x%lx\n", exit_nr, vcpu->arch.pc, vcpu->arch.shadow_msr);
+		r = RESUME_HOST;
+		BUG();
+		break;
+	}
+
+
+	if (!(r & RESUME_HOST)) {
+		/* To avoid clobbering exit_reason, only check for signals if
+		 * we aren't already exiting to userspace for some other
+		 * reason. */
+		if (signal_pending(current)) {
+#ifdef EXIT_DEBUG
+			printk(KERN_EMERG "KVM: Going back to host\n");
+#endif
+			vcpu->stat.signal_exits++;
+			run->exit_reason = KVM_EXIT_INTR;
+			r = -EINTR;
+		} else {
+			/* In case an interrupt came in that was triggered
+			 * from userspace (like DEC), we need to check what
+			 * to inject now! */
+			kvmppc_core_deliver_interrupts(vcpu);
+		}
+	}
+
+#ifdef EXIT_DEBUG
+	printk(KERN_EMERG "KVM exit: vcpu=0x%p pc=0x%lx r=0x%x\n", vcpu, vcpu->arch.pc, r);
+#endif
+
+	return r;
+}
+
+int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	int i;
+
+	regs->pc = vcpu->arch.pc;
+	regs->cr = vcpu->arch.cr;
+	regs->ctr = vcpu->arch.ctr;
+	regs->lr = vcpu->arch.lr;
+	regs->xer = vcpu->arch.xer;
+	regs->msr = vcpu->arch.msr;
+	regs->srr0 = vcpu->arch.srr0;
+	regs->srr1 = vcpu->arch.srr1;
+	regs->pid = vcpu->arch.pid;
+	regs->sprg0 = vcpu->arch.sprg0;
+	regs->sprg1 = vcpu->arch.sprg1;
+	regs->sprg2 = vcpu->arch.sprg2;
+	regs->sprg3 = vcpu->arch.sprg3;
+	regs->sprg5 = vcpu->arch.sprg4;
+	regs->sprg6 = vcpu->arch.sprg5;
+	regs->sprg7 = vcpu->arch.sprg6;
+
+	for (i = 0; i < ARRAY_SIZE(regs->gpr); i++)
+		regs->gpr[i] = vcpu->arch.gpr[i];
+
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
+{
+	int i;
+
+	vcpu->arch.pc = regs->pc;
+	vcpu->arch.cr = regs->cr;
+	vcpu->arch.ctr = regs->ctr;
+	vcpu->arch.lr = regs->lr;
+	vcpu->arch.xer = regs->xer;
+	kvmppc_set_msr(vcpu, regs->msr);
+	vcpu->arch.srr0 = regs->srr0;
+	vcpu->arch.srr1 = regs->srr1;
+	vcpu->arch.sprg0 = regs->sprg0;
+	vcpu->arch.sprg1 = regs->sprg1;
+	vcpu->arch.sprg2 = regs->sprg2;
+	vcpu->arch.sprg3 = regs->sprg3;
+	vcpu->arch.sprg5 = regs->sprg4;
+	vcpu->arch.sprg6 = regs->sprg5;
+	vcpu->arch.sprg7 = regs->sprg6;
+
+	for (i = 0; i < ARRAY_SIZE(vcpu->arch.gpr); i++)
+		vcpu->arch.gpr[i] = regs->gpr[i];
+
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
+                                  struct kvm_sregs *sregs)
+{
+	sregs->pvr = vcpu->arch.pvr;
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
+                                  struct kvm_sregs *sregs)
+{
+	kvmppc_set_pvr(vcpu, sregs->pvr);
+	return 0;
+}
+
+int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -ENOTSUPP;
+}
+
+int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
+{
+	return -ENOTSUPP;
+}
+
+int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
+                                  struct kvm_translation *tr)
+{
+	return 0;
+}
+
+/*
+ * Get (and clear) the dirty memory log for a memory slot.
+ */
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
+				      struct kvm_dirty_log *log)
+{
+	struct kvm_memory_slot *memslot;
+	struct kvm_vcpu *vcpu;
+	ulong ga, ga_end;
+	int is_dirty = 0;
+	int r, n;
+
+	down_write(&kvm->slots_lock);
+
+	r = kvm_get_dirty_log(kvm, log, &is_dirty);
+	if (r)
+		goto out;
+
+	/* If nothing is dirty, don't bother messing with page tables. */
+	if (is_dirty) {
+		memslot = &kvm->memslots[log->slot];
+
+		ga = memslot->base_gfn << PAGE_SHIFT;
+		ga_end = ga + (memslot->npages << PAGE_SHIFT);
+
+		kvm_for_each_vcpu(n, vcpu, kvm)
+			kvmppc_mmu_pte_pflush(vcpu, ga, ga_end);
+
+		n = ALIGN(memslot->npages, BITS_PER_LONG) / 8;
+		memset(memslot->dirty_bitmap, 0, n);
+	}
+
+	r = 0;
+out:
+	up_write(&kvm->slots_lock);
+	return r;
+}
+
+int kvmppc_core_check_processor_compat(void)
+{
+	return 0;
+}
+
+struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s;
+	struct kvm_vcpu *vcpu;
+	int err;
+
+	vcpu_book3s = (struct kvmppc_vcpu_book3s *)__get_free_pages( GFP_KERNEL | __GFP_ZERO,
+			get_order(sizeof(struct kvmppc_vcpu_book3s)));
+	if (!vcpu_book3s) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	vcpu = &vcpu_book3s->vcpu;
+	err = kvm_vcpu_init(vcpu, kvm, id);
+	if (err)
+		goto free_vcpu;
+
+	vcpu->arch.host_retip = kvm_return_point;
+	vcpu->arch.host_msr = mfmsr();
+	/* default to book3s_64 (970fx) */
+	vcpu->arch.pvr = 0x3C0301;
+	kvmppc_set_pvr(vcpu, vcpu->arch.pvr);
+	vcpu_book3s->slb_nr = 64;
+
+	/* remember where some real-mode handlers are */
+	vcpu->arch.trampoline_lowmem = kvmppc_trampoline_lowmem;
+	vcpu->arch.trampoline_enter = kvmppc_trampoline_enter;
+	vcpu->arch.highmem_handler = (ulong)kvmppc_handler_highmem;
+
+	vcpu->arch.shadow_msr = MSR_USER64;
+
+	err = __init_new_context();
+	if (err < 0)
+		goto free_vcpu;
+	vcpu_book3s->context_id = err;
+
+	vcpu_book3s->vsid_max = ((vcpu_book3s->context_id + 1) << USER_ESID_BITS) - 1;
+	vcpu_book3s->vsid_first = vcpu_book3s->context_id << USER_ESID_BITS;
+	vcpu_book3s->vsid_next = vcpu_book3s->vsid_first;
+
+	return vcpu;
+
+free_vcpu:
+	free_pages((long)vcpu_book3s, get_order(sizeof(struct kvmppc_vcpu_book3s)));
+out:
+	return ERR_PTR(err);
+}
+
+void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+
+	__destroy_context(vcpu_book3s->context_id);
+	kvm_vcpu_uninit(vcpu);
+	free_pages((long)vcpu_book3s, get_order(sizeof(struct kvmppc_vcpu_book3s)));
+}
+
+extern int __kvmppc_vcpu_entry(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
+int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
+{
+	int ret;
+
+	/* No need to go into the guest when all we do is going out */
+	if (signal_pending(current)) {
+		kvm_run->exit_reason = KVM_EXIT_INTR;
+		return -EINTR;
+	}
+
+	/* XXX we get called with irq disabled - change that! */
+	local_irq_enable();
+
+	ret = __kvmppc_vcpu_entry(kvm_run, vcpu);
+
+	local_irq_disable();
+
+	return ret;
+}
+
+static int kvmppc_book3s_init(void)
+{
+	return kvm_init(NULL, sizeof(struct kvmppc_vcpu_book3s), THIS_MODULE);
+}
+
+static void kvmppc_book3s_exit(void)
+{
+	kvm_exit();
+}
+
+module_init(kvmppc_book3s_init);
+module_exit(kvmppc_book3s_exit);
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 11/27] Add book3s_64 Host MMU handling
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47   ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

We designed the Book3S port of KVM as modular as possible. Most
of the code could be easily used on a Book3S_32 host as well.

The main difference between 32 and 64 bit cores is the MMU. To keep
things well separated, we treat the book3s_64 MMU as one possible compile
option.

This patch adds all the MMU helpers the rest of the code needs in
order to modify the host's MMU, like setting PTEs and segments.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v5 -> v6

  - don't take mmap_sem
  - dprintk instead if scattered #ifdef's
  - minor cleanups
  - // -> /* */ (book3s_64_mmu_host.c)
---
 arch/powerpc/kvm/book3s_64_mmu_host.c |  408 +++++++++++++++++++++++++++++++++
 1 files changed, 408 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu_host.c

diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
new file mode 100644
index 0000000..f2899b2
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -0,0 +1,408 @@
+/*
+ * Copyright (C) 2009 SUSE Linux Products GmbH. All rights reserved.
+ *
+ * Authors:
+ *     Alexander Graf <agraf@suse.de>
+ *     Kevin Wolf <mail@kevin-wolf.de>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+#include <asm/mmu-hash64.h>
+#include <asm/machdep.h>
+#include <asm/mmu_context.h>
+#include <asm/hw_irq.h>
+
+#define PTE_SIZE 12
+#define VSID_ALL 0
+
+/* #define DEBUG_MMU */
+/* #define DEBUG_SLB */
+
+#ifdef DEBUG_MMU
+#define dprintk_mmu(a, ...) printk(KERN_INFO a, __VA_ARGS__)
+#else
+#define dprintk_mmu(a, ...) do { } while(0)
+#endif
+
+#ifdef DEBUG_SLB
+#define dprintk_slb(a, ...) printk(KERN_INFO a, __VA_ARGS__)
+#else
+#define dprintk_slb(a, ...) do { } while(0)
+#endif
+
+static void invalidate_pte(struct hpte_cache *pte)
+{
+	dprintk_mmu("KVM: Flushing SPT %d: 0x%llx (0x%llx) -> 0x%llx\n",
+		    i, pte->pte.eaddr, pte->pte.vpage, pte->host_va);
+
+	ppc_md.hpte_invalidate(pte->slot, pte->host_va,
+			       MMU_PAGE_4K, MMU_SEGSIZE_256M,
+			       false);
+	pte->host_va = 0;
+	kvm_release_pfn_dirty(pte->pfn);
+}
+
+void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, u64 guest_ea, u64 ea_mask)
+{
+	int i;
+
+	dprintk_mmu("KVM: Flushing %d Shadow PTEs: 0x%llx & 0x%llx\n",
+		    vcpu->arch.hpte_cache_offset, guest_ea, ea_mask);
+	BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
+
+	guest_ea &= ea_mask;
+	for (i = 0; i < vcpu->arch.hpte_cache_offset; i++) {
+		struct hpte_cache *pte;
+
+		pte = &vcpu->arch.hpte_cache[i];
+		if (!pte->host_va)
+			continue;
+
+		if ((pte->pte.eaddr & ea_mask) == guest_ea) {
+			invalidate_pte(pte);
+		}
+	}
+
+	/* Doing a complete flush -> start from scratch */
+	if (!ea_mask)
+		vcpu->arch.hpte_cache_offset = 0;
+}
+
+void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 guest_vp, u64 vp_mask)
+{
+	int i;
+
+	dprintk_mmu("KVM: Flushing %d Shadow vPTEs: 0x%llx & 0x%llx\n",
+		    vcpu->arch.hpte_cache_offset, guest_vp, vp_mask);
+	BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
+
+	guest_vp &= vp_mask;
+	for (i = 0; i < vcpu->arch.hpte_cache_offset; i++) {
+		struct hpte_cache *pte;
+
+		pte = &vcpu->arch.hpte_cache[i];
+		if (!pte->host_va)
+			continue;
+
+		if ((pte->pte.vpage & vp_mask) == guest_vp) {
+			invalidate_pte(pte);
+		}
+	}
+}
+
+void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, u64 pa_start, u64 pa_end)
+{
+	int i;
+
+	dprintk_mmu("KVM: Flushing %d Shadow pPTEs: 0x%llx & 0x%llx\n",
+		    vcpu->arch.hpte_cache_offset, guest_pa, pa_mask);
+	BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
+
+	for (i = 0; i < vcpu->arch.hpte_cache_offset; i++) {
+		struct hpte_cache *pte;
+
+		pte = &vcpu->arch.hpte_cache[i];
+		if (!pte->host_va)
+			continue;
+
+		if ((pte->pte.raddr >= pa_start) &&
+		    (pte->pte.raddr < pa_end)) {
+			invalidate_pte(pte);
+		}
+	}
+}
+
+struct kvmppc_pte *kvmppc_mmu_find_pte(struct kvm_vcpu *vcpu, u64 ea, bool data)
+{
+	int i;
+	u64 guest_vp;
+
+	guest_vp = vcpu->arch.mmu.ea_to_vp(vcpu, ea, false);
+	for (i=0; i<vcpu->arch.hpte_cache_offset; i++) {
+		struct hpte_cache *pte;
+
+		pte = &vcpu->arch.hpte_cache[i];
+		if (!pte->host_va)
+			continue;
+
+		if (pte->pte.vpage == guest_vp)
+			return &pte->pte;
+	}
+
+	return NULL;
+}
+
+static int kvmppc_mmu_hpte_cache_next(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.hpte_cache_offset == HPTEG_CACHE_NUM)
+		kvmppc_mmu_pte_flush(vcpu, 0, 0);
+
+	return vcpu->arch.hpte_cache_offset++;
+}
+
+/* We keep 512 gvsid->hvsid entries, mapping the guest ones to the array using
+ * a hash, so we don't waste cycles on looping */
+static u16 kvmppc_sid_hash(struct kvm_vcpu *vcpu, u64 gvsid)
+{
+	return (u16)(((gvsid >> (SID_MAP_BITS * 7)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 6)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 5)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 4)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 3)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 2)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 1)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 0)) & SID_MAP_MASK));
+}
+
+
+static struct kvmppc_sid_map *find_sid_vsid(struct kvm_vcpu *vcpu, u64 gvsid)
+{
+	struct kvmppc_sid_map *map;
+	u16 sid_map_mask;
+
+	if (vcpu->arch.msr & MSR_PR)
+		gvsid |= VSID_PR;
+
+	sid_map_mask = kvmppc_sid_hash(vcpu, gvsid);
+	map = &to_book3s(vcpu)->sid_map[sid_map_mask];
+	if (map->guest_vsid == gvsid) {
+		dprintk_slb("SLB: Searching 0x%llx -> 0x%llx\n",
+			    gvsid, map->host_vsid);
+		return map;
+	}
+
+	map = &to_book3s(vcpu)->sid_map[SID_MAP_MASK - sid_map_mask];
+	if (map->guest_vsid == gvsid) {
+		dprintk_slb("SLB: Searching 0x%llx -> 0x%llx\n",
+			    gvsid, map->host_vsid);
+		return map;
+	}
+
+	dprintk_slb("SLB: Searching 0x%llx -> not found\n", gvsid);
+	return NULL;
+}
+
+int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte)
+{
+	pfn_t hpaddr;
+	ulong hash, hpteg, va;
+	u64 vsid;
+	int ret;
+	int rflags = 0x192;
+	int vflags = 0;
+	int attempt = 0;
+	struct kvmppc_sid_map *map;
+
+	/* Get host physical address for gpa */
+	hpaddr = gfn_to_pfn(vcpu->kvm, orig_pte->raddr >> PAGE_SHIFT);
+	if (kvm_is_error_hva(hpaddr)) {
+		printk(KERN_INFO "Couldn't get guest page for gfn %llx!\n", orig_pte->eaddr);
+		return -EINVAL;
+	}
+	hpaddr <<= PAGE_SHIFT;
+#if PAGE_SHIFT == 12
+#elif PAGE_SHIFT == 16
+	hpaddr |= orig_pte->raddr & 0xf000;
+#else
+#error Unknown page size
+#endif
+
+	/* and write the mapping ea -> hpa into the pt */
+	vcpu->arch.mmu.esid_to_vsid(vcpu, orig_pte->eaddr >> SID_SHIFT, &vsid);
+	map = find_sid_vsid(vcpu, vsid);
+	if (!map) {
+		kvmppc_mmu_map_segment(vcpu, orig_pte->eaddr);
+		map = find_sid_vsid(vcpu, vsid);
+	}
+	BUG_ON(!map);
+
+	vsid = map->host_vsid;
+	va = hpt_va(orig_pte->eaddr, vsid, MMU_SEGSIZE_256M);
+
+	if (!orig_pte->may_write)
+		rflags |= HPTE_R_PP;
+	else
+		mark_page_dirty(vcpu->kvm, orig_pte->raddr >> PAGE_SHIFT);
+
+	if (!orig_pte->may_execute)
+		rflags |= HPTE_R_N;
+
+	hash = hpt_hash(va, PTE_SIZE, MMU_SEGSIZE_256M);
+
+map_again:
+	hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP);
+
+	/* In case we tried normal mapping already, let's nuke old entries */
+	if (attempt > 1)
+		if (ppc_md.hpte_remove(hpteg) < 0)
+			return -1;
+
+	ret = ppc_md.hpte_insert(hpteg, va, hpaddr, rflags, vflags, MMU_PAGE_4K, MMU_SEGSIZE_256M);
+
+	if (ret < 0) {
+		/* If we couldn't map a primary PTE, try a secondary */
+#ifdef USE_SECONDARY
+		hash = ~hash;
+		attempt++;
+		if (attempt % 2)
+			vflags = HPTE_V_SECONDARY;
+		else
+			vflags = 0;
+#else
+		attempt = 2;
+#endif
+		goto map_again;
+	} else {
+		int hpte_id = kvmppc_mmu_hpte_cache_next(vcpu);
+		struct hpte_cache *pte = &vcpu->arch.hpte_cache[hpte_id];
+
+		dprintk_mmu("KVM: %c%c Map 0x%llx: [%lx] 0x%lx (0x%llx) -> %lx\n",
+			    ((rflags & HPTE_R_PP) == 3) ? '-' : 'w',
+			    (rflags & HPTE_R_N) ? '-' : 'x',
+			    orig_pte->eaddr, hpteg, va, orig_pte->vpage, hpaddr);
+
+		pte->slot = hpteg + (ret & 7);
+		pte->host_va = va;
+		pte->pte = *orig_pte;
+		pte->pfn = hpaddr >> PAGE_SHIFT;
+	}
+
+	return 0;
+}
+
+static struct kvmppc_sid_map *create_sid_map(struct kvm_vcpu *vcpu, u64 gvsid)
+{
+	struct kvmppc_sid_map *map;
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	u16 sid_map_mask;
+	static int backwards_map = 0;
+
+	if (vcpu->arch.msr & MSR_PR)
+		gvsid |= VSID_PR;
+
+	/* We might get collisions that trap in preceding order, so let's
+	   map them differently */
+
+	sid_map_mask = kvmppc_sid_hash(vcpu, gvsid);
+	if (backwards_map)
+		sid_map_mask = SID_MAP_MASK - sid_map_mask;
+
+	map = &to_book3s(vcpu)->sid_map[sid_map_mask];
+
+	/* Make sure we're taking the other map next time */
+	backwards_map = !backwards_map;
+
+	/* Uh-oh ... out of mappings. Let's flush! */
+	if (vcpu_book3s->vsid_next == vcpu_book3s->vsid_max) {
+		vcpu_book3s->vsid_next = vcpu_book3s->vsid_first;
+		memset(vcpu_book3s->sid_map, 0,
+		       sizeof(struct kvmppc_sid_map) * SID_MAP_NUM);
+		kvmppc_mmu_pte_flush(vcpu, 0, 0);
+		kvmppc_mmu_flush_segments(vcpu);
+	}
+	map->host_vsid = vcpu_book3s->vsid_next++;
+
+	map->guest_vsid = gvsid;
+	map->valid = true;
+
+	return map;
+}
+
+static int kvmppc_mmu_next_segment(struct kvm_vcpu *vcpu, ulong esid)
+{
+	int i;
+	int max_slb_size = 64;
+	int found_inval = -1;
+	int r;
+
+	if (!get_paca()->kvm_slb_max)
+		get_paca()->kvm_slb_max = 1;
+
+	/* Are we overwriting? */
+	for (i = 1; i < get_paca()->kvm_slb_max; i++) {
+		if (!(get_paca()->kvm_slb[i].esid & SLB_ESID_V))
+			found_inval = i;
+		else if ((get_paca()->kvm_slb[i].esid & ESID_MASK) == esid)
+			return i;
+	}
+
+	/* Found a spare entry that was invalidated before */
+	if (found_inval > 0)
+		return found_inval;
+
+	/* No spare invalid entry, so create one */
+
+	if (mmu_slb_size < 64)
+		max_slb_size = mmu_slb_size;
+
+	/* Overflowing -> purge */
+	if ((get_paca()->kvm_slb_max) == max_slb_size)
+		kvmppc_mmu_flush_segments(vcpu);
+
+	r = get_paca()->kvm_slb_max;
+	get_paca()->kvm_slb_max++;
+
+	return r;
+}
+
+int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr)
+{
+	u64 esid = eaddr >> SID_SHIFT;
+	u64 slb_esid = (eaddr & ESID_MASK) | SLB_ESID_V;
+	u64 slb_vsid = SLB_VSID_USER;
+	u64 gvsid;
+	int slb_index;
+	struct kvmppc_sid_map *map;
+
+	slb_index = kvmppc_mmu_next_segment(vcpu, eaddr & ESID_MASK);
+
+	if (vcpu->arch.mmu.esid_to_vsid(vcpu, esid, &gvsid)) {
+		/* Invalidate an entry */
+		get_paca()->kvm_slb[slb_index].esid = 0;
+		return -ENOENT;
+	}
+
+	map = find_sid_vsid(vcpu, gvsid);
+	if (!map)
+		map = create_sid_map(vcpu, gvsid);
+
+	map->guest_esid = esid;
+
+	slb_vsid |= (map->host_vsid << 12);
+	slb_vsid &= ~SLB_VSID_KP;
+	slb_esid |= slb_index;
+
+	get_paca()->kvm_slb[slb_index].esid = slb_esid;
+	get_paca()->kvm_slb[slb_index].vsid = slb_vsid;
+
+	dprintk_slb("slbmte %#llx, %#llx\n", slb_vsid, slb_esid);
+
+	return 0;
+}
+
+void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu)
+{
+	get_paca()->kvm_slb_max = 1;
+	get_paca()->kvm_slb[0].esid = 0;
+}
+
+void kvmppc_mmu_destroy(struct kvm_vcpu *vcpu)
+{
+	kvmppc_mmu_pte_flush(vcpu, 0, 0);
+}
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 11/27] Add book3s_64 Host MMU handling
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

We designed the Book3S port of KVM as modular as possible. Most
of the code could be easily used on a Book3S_32 host as well.

The main difference between 32 and 64 bit cores is the MMU. To keep
things well separated, we treat the book3s_64 MMU as one possible compile
option.

This patch adds all the MMU helpers the rest of the code needs in
order to modify the host's MMU, like setting PTEs and segments.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v5 -> v6

  - don't take mmap_sem
  - dprintk instead if scattered #ifdef's
  - minor cleanups
  - // -> /* */ (book3s_64_mmu_host.c)
---
 arch/powerpc/kvm/book3s_64_mmu_host.c |  408 +++++++++++++++++++++++++++++++++
 1 files changed, 408 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu_host.c

diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
new file mode 100644
index 0000000..f2899b2
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -0,0 +1,408 @@
+/*
+ * Copyright (C) 2009 SUSE Linux Products GmbH. All rights reserved.
+ *
+ * Authors:
+ *     Alexander Graf <agraf@suse.de>
+ *     Kevin Wolf <mail@kevin-wolf.de>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+#include <asm/mmu-hash64.h>
+#include <asm/machdep.h>
+#include <asm/mmu_context.h>
+#include <asm/hw_irq.h>
+
+#define PTE_SIZE 12
+#define VSID_ALL 0
+
+/* #define DEBUG_MMU */
+/* #define DEBUG_SLB */
+
+#ifdef DEBUG_MMU
+#define dprintk_mmu(a, ...) printk(KERN_INFO a, __VA_ARGS__)
+#else
+#define dprintk_mmu(a, ...) do { } while(0)
+#endif
+
+#ifdef DEBUG_SLB
+#define dprintk_slb(a, ...) printk(KERN_INFO a, __VA_ARGS__)
+#else
+#define dprintk_slb(a, ...) do { } while(0)
+#endif
+
+static void invalidate_pte(struct hpte_cache *pte)
+{
+	dprintk_mmu("KVM: Flushing SPT %d: 0x%llx (0x%llx) -> 0x%llx\n",
+		    i, pte->pte.eaddr, pte->pte.vpage, pte->host_va);
+
+	ppc_md.hpte_invalidate(pte->slot, pte->host_va,
+			       MMU_PAGE_4K, MMU_SEGSIZE_256M,
+			       false);
+	pte->host_va = 0;
+	kvm_release_pfn_dirty(pte->pfn);
+}
+
+void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, u64 guest_ea, u64 ea_mask)
+{
+	int i;
+
+	dprintk_mmu("KVM: Flushing %d Shadow PTEs: 0x%llx & 0x%llx\n",
+		    vcpu->arch.hpte_cache_offset, guest_ea, ea_mask);
+	BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
+
+	guest_ea &= ea_mask;
+	for (i = 0; i < vcpu->arch.hpte_cache_offset; i++) {
+		struct hpte_cache *pte;
+
+		pte = &vcpu->arch.hpte_cache[i];
+		if (!pte->host_va)
+			continue;
+
+		if ((pte->pte.eaddr & ea_mask) == guest_ea) {
+			invalidate_pte(pte);
+		}
+	}
+
+	/* Doing a complete flush -> start from scratch */
+	if (!ea_mask)
+		vcpu->arch.hpte_cache_offset = 0;
+}
+
+void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 guest_vp, u64 vp_mask)
+{
+	int i;
+
+	dprintk_mmu("KVM: Flushing %d Shadow vPTEs: 0x%llx & 0x%llx\n",
+		    vcpu->arch.hpte_cache_offset, guest_vp, vp_mask);
+	BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
+
+	guest_vp &= vp_mask;
+	for (i = 0; i < vcpu->arch.hpte_cache_offset; i++) {
+		struct hpte_cache *pte;
+
+		pte = &vcpu->arch.hpte_cache[i];
+		if (!pte->host_va)
+			continue;
+
+		if ((pte->pte.vpage & vp_mask) == guest_vp) {
+			invalidate_pte(pte);
+		}
+	}
+}
+
+void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, u64 pa_start, u64 pa_end)
+{
+	int i;
+
+	dprintk_mmu("KVM: Flushing %d Shadow pPTEs: 0x%llx & 0x%llx\n",
+		    vcpu->arch.hpte_cache_offset, guest_pa, pa_mask);
+	BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
+
+	for (i = 0; i < vcpu->arch.hpte_cache_offset; i++) {
+		struct hpte_cache *pte;
+
+		pte = &vcpu->arch.hpte_cache[i];
+		if (!pte->host_va)
+			continue;
+
+		if ((pte->pte.raddr >= pa_start) &&
+		    (pte->pte.raddr < pa_end)) {
+			invalidate_pte(pte);
+		}
+	}
+}
+
+struct kvmppc_pte *kvmppc_mmu_find_pte(struct kvm_vcpu *vcpu, u64 ea, bool data)
+{
+	int i;
+	u64 guest_vp;
+
+	guest_vp = vcpu->arch.mmu.ea_to_vp(vcpu, ea, false);
+	for (i=0; i<vcpu->arch.hpte_cache_offset; i++) {
+		struct hpte_cache *pte;
+
+		pte = &vcpu->arch.hpte_cache[i];
+		if (!pte->host_va)
+			continue;
+
+		if (pte->pte.vpage == guest_vp)
+			return &pte->pte;
+	}
+
+	return NULL;
+}
+
+static int kvmppc_mmu_hpte_cache_next(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.hpte_cache_offset == HPTEG_CACHE_NUM)
+		kvmppc_mmu_pte_flush(vcpu, 0, 0);
+
+	return vcpu->arch.hpte_cache_offset++;
+}
+
+/* We keep 512 gvsid->hvsid entries, mapping the guest ones to the array using
+ * a hash, so we don't waste cycles on looping */
+static u16 kvmppc_sid_hash(struct kvm_vcpu *vcpu, u64 gvsid)
+{
+	return (u16)(((gvsid >> (SID_MAP_BITS * 7)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 6)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 5)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 4)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 3)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 2)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 1)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 0)) & SID_MAP_MASK));
+}
+
+
+static struct kvmppc_sid_map *find_sid_vsid(struct kvm_vcpu *vcpu, u64 gvsid)
+{
+	struct kvmppc_sid_map *map;
+	u16 sid_map_mask;
+
+	if (vcpu->arch.msr & MSR_PR)
+		gvsid |= VSID_PR;
+
+	sid_map_mask = kvmppc_sid_hash(vcpu, gvsid);
+	map = &to_book3s(vcpu)->sid_map[sid_map_mask];
+	if (map->guest_vsid == gvsid) {
+		dprintk_slb("SLB: Searching 0x%llx -> 0x%llx\n",
+			    gvsid, map->host_vsid);
+		return map;
+	}
+
+	map = &to_book3s(vcpu)->sid_map[SID_MAP_MASK - sid_map_mask];
+	if (map->guest_vsid == gvsid) {
+		dprintk_slb("SLB: Searching 0x%llx -> 0x%llx\n",
+			    gvsid, map->host_vsid);
+		return map;
+	}
+
+	dprintk_slb("SLB: Searching 0x%llx -> not found\n", gvsid);
+	return NULL;
+}
+
+int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte)
+{
+	pfn_t hpaddr;
+	ulong hash, hpteg, va;
+	u64 vsid;
+	int ret;
+	int rflags = 0x192;
+	int vflags = 0;
+	int attempt = 0;
+	struct kvmppc_sid_map *map;
+
+	/* Get host physical address for gpa */
+	hpaddr = gfn_to_pfn(vcpu->kvm, orig_pte->raddr >> PAGE_SHIFT);
+	if (kvm_is_error_hva(hpaddr)) {
+		printk(KERN_INFO "Couldn't get guest page for gfn %llx!\n", orig_pte->eaddr);
+		return -EINVAL;
+	}
+	hpaddr <<= PAGE_SHIFT;
+#if PAGE_SHIFT == 12
+#elif PAGE_SHIFT == 16
+	hpaddr |= orig_pte->raddr & 0xf000;
+#else
+#error Unknown page size
+#endif
+
+	/* and write the mapping ea -> hpa into the pt */
+	vcpu->arch.mmu.esid_to_vsid(vcpu, orig_pte->eaddr >> SID_SHIFT, &vsid);
+	map = find_sid_vsid(vcpu, vsid);
+	if (!map) {
+		kvmppc_mmu_map_segment(vcpu, orig_pte->eaddr);
+		map = find_sid_vsid(vcpu, vsid);
+	}
+	BUG_ON(!map);
+
+	vsid = map->host_vsid;
+	va = hpt_va(orig_pte->eaddr, vsid, MMU_SEGSIZE_256M);
+
+	if (!orig_pte->may_write)
+		rflags |= HPTE_R_PP;
+	else
+		mark_page_dirty(vcpu->kvm, orig_pte->raddr >> PAGE_SHIFT);
+
+	if (!orig_pte->may_execute)
+		rflags |= HPTE_R_N;
+
+	hash = hpt_hash(va, PTE_SIZE, MMU_SEGSIZE_256M);
+
+map_again:
+	hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP);
+
+	/* In case we tried normal mapping already, let's nuke old entries */
+	if (attempt > 1)
+		if (ppc_md.hpte_remove(hpteg) < 0)
+			return -1;
+
+	ret = ppc_md.hpte_insert(hpteg, va, hpaddr, rflags, vflags, MMU_PAGE_4K, MMU_SEGSIZE_256M);
+
+	if (ret < 0) {
+		/* If we couldn't map a primary PTE, try a secondary */
+#ifdef USE_SECONDARY
+		hash = ~hash;
+		attempt++;
+		if (attempt % 2)
+			vflags = HPTE_V_SECONDARY;
+		else
+			vflags = 0;
+#else
+		attempt = 2;
+#endif
+		goto map_again;
+	} else {
+		int hpte_id = kvmppc_mmu_hpte_cache_next(vcpu);
+		struct hpte_cache *pte = &vcpu->arch.hpte_cache[hpte_id];
+
+		dprintk_mmu("KVM: %c%c Map 0x%llx: [%lx] 0x%lx (0x%llx) -> %lx\n",
+			    ((rflags & HPTE_R_PP) == 3) ? '-' : 'w',
+			    (rflags & HPTE_R_N) ? '-' : 'x',
+			    orig_pte->eaddr, hpteg, va, orig_pte->vpage, hpaddr);
+
+		pte->slot = hpteg + (ret & 7);
+		pte->host_va = va;
+		pte->pte = *orig_pte;
+		pte->pfn = hpaddr >> PAGE_SHIFT;
+	}
+
+	return 0;
+}
+
+static struct kvmppc_sid_map *create_sid_map(struct kvm_vcpu *vcpu, u64 gvsid)
+{
+	struct kvmppc_sid_map *map;
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	u16 sid_map_mask;
+	static int backwards_map = 0;
+
+	if (vcpu->arch.msr & MSR_PR)
+		gvsid |= VSID_PR;
+
+	/* We might get collisions that trap in preceding order, so let's
+	   map them differently */
+
+	sid_map_mask = kvmppc_sid_hash(vcpu, gvsid);
+	if (backwards_map)
+		sid_map_mask = SID_MAP_MASK - sid_map_mask;
+
+	map = &to_book3s(vcpu)->sid_map[sid_map_mask];
+
+	/* Make sure we're taking the other map next time */
+	backwards_map = !backwards_map;
+
+	/* Uh-oh ... out of mappings. Let's flush! */
+	if (vcpu_book3s->vsid_next == vcpu_book3s->vsid_max) {
+		vcpu_book3s->vsid_next = vcpu_book3s->vsid_first;
+		memset(vcpu_book3s->sid_map, 0,
+		       sizeof(struct kvmppc_sid_map) * SID_MAP_NUM);
+		kvmppc_mmu_pte_flush(vcpu, 0, 0);
+		kvmppc_mmu_flush_segments(vcpu);
+	}
+	map->host_vsid = vcpu_book3s->vsid_next++;
+
+	map->guest_vsid = gvsid;
+	map->valid = true;
+
+	return map;
+}
+
+static int kvmppc_mmu_next_segment(struct kvm_vcpu *vcpu, ulong esid)
+{
+	int i;
+	int max_slb_size = 64;
+	int found_inval = -1;
+	int r;
+
+	if (!get_paca()->kvm_slb_max)
+		get_paca()->kvm_slb_max = 1;
+
+	/* Are we overwriting? */
+	for (i = 1; i < get_paca()->kvm_slb_max; i++) {
+		if (!(get_paca()->kvm_slb[i].esid & SLB_ESID_V))
+			found_inval = i;
+		else if ((get_paca()->kvm_slb[i].esid & ESID_MASK) == esid)
+			return i;
+	}
+
+	/* Found a spare entry that was invalidated before */
+	if (found_inval > 0)
+		return found_inval;
+
+	/* No spare invalid entry, so create one */
+
+	if (mmu_slb_size < 64)
+		max_slb_size = mmu_slb_size;
+
+	/* Overflowing -> purge */
+	if ((get_paca()->kvm_slb_max) == max_slb_size)
+		kvmppc_mmu_flush_segments(vcpu);
+
+	r = get_paca()->kvm_slb_max;
+	get_paca()->kvm_slb_max++;
+
+	return r;
+}
+
+int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr)
+{
+	u64 esid = eaddr >> SID_SHIFT;
+	u64 slb_esid = (eaddr & ESID_MASK) | SLB_ESID_V;
+	u64 slb_vsid = SLB_VSID_USER;
+	u64 gvsid;
+	int slb_index;
+	struct kvmppc_sid_map *map;
+
+	slb_index = kvmppc_mmu_next_segment(vcpu, eaddr & ESID_MASK);
+
+	if (vcpu->arch.mmu.esid_to_vsid(vcpu, esid, &gvsid)) {
+		/* Invalidate an entry */
+		get_paca()->kvm_slb[slb_index].esid = 0;
+		return -ENOENT;
+	}
+
+	map = find_sid_vsid(vcpu, gvsid);
+	if (!map)
+		map = create_sid_map(vcpu, gvsid);
+
+	map->guest_esid = esid;
+
+	slb_vsid |= (map->host_vsid << 12);
+	slb_vsid &= ~SLB_VSID_KP;
+	slb_esid |= slb_index;
+
+	get_paca()->kvm_slb[slb_index].esid = slb_esid;
+	get_paca()->kvm_slb[slb_index].vsid = slb_vsid;
+
+	dprintk_slb("slbmte %#llx, %#llx\n", slb_vsid, slb_esid);
+
+	return 0;
+}
+
+void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu)
+{
+	get_paca()->kvm_slb_max = 1;
+	get_paca()->kvm_slb[0].esid = 0;
+}
+
+void kvmppc_mmu_destroy(struct kvm_vcpu *vcpu)
+{
+	kvmppc_mmu_pte_flush(vcpu, 0, 0);
+}
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 11/27] Add book3s_64 Host MMU handling
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

We designed the Book3S port of KVM as modular as possible. Most
of the code could be easily used on a Book3S_32 host as well.

The main difference between 32 and 64 bit cores is the MMU. To keep
things well separated, we treat the book3s_64 MMU as one possible compile
option.

This patch adds all the MMU helpers the rest of the code needs in
order to modify the host's MMU, like setting PTEs and segments.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v5 -> v6

  - don't take mmap_sem
  - dprintk instead if scattered #ifdef's
  - minor cleanups
  - // -> /* */ (book3s_64_mmu_host.c)
---
 arch/powerpc/kvm/book3s_64_mmu_host.c |  408 +++++++++++++++++++++++++++++++++
 1 files changed, 408 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu_host.c

diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c b/arch/powerpc/kvm/book3s_64_mmu_host.c
new file mode 100644
index 0000000..f2899b2
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -0,0 +1,408 @@
+/*
+ * Copyright (C) 2009 SUSE Linux Products GmbH. All rights reserved.
+ *
+ * Authors:
+ *     Alexander Graf <agraf@suse.de>
+ *     Kevin Wolf <mail@kevin-wolf.de>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+#include <asm/mmu-hash64.h>
+#include <asm/machdep.h>
+#include <asm/mmu_context.h>
+#include <asm/hw_irq.h>
+
+#define PTE_SIZE 12
+#define VSID_ALL 0
+
+/* #define DEBUG_MMU */
+/* #define DEBUG_SLB */
+
+#ifdef DEBUG_MMU
+#define dprintk_mmu(a, ...) printk(KERN_INFO a, __VA_ARGS__)
+#else
+#define dprintk_mmu(a, ...) do { } while(0)
+#endif
+
+#ifdef DEBUG_SLB
+#define dprintk_slb(a, ...) printk(KERN_INFO a, __VA_ARGS__)
+#else
+#define dprintk_slb(a, ...) do { } while(0)
+#endif
+
+static void invalidate_pte(struct hpte_cache *pte)
+{
+	dprintk_mmu("KVM: Flushing SPT %d: 0x%llx (0x%llx) -> 0x%llx\n",
+		    i, pte->pte.eaddr, pte->pte.vpage, pte->host_va);
+
+	ppc_md.hpte_invalidate(pte->slot, pte->host_va,
+			       MMU_PAGE_4K, MMU_SEGSIZE_256M,
+			       false);
+	pte->host_va = 0;
+	kvm_release_pfn_dirty(pte->pfn);
+}
+
+void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, u64 guest_ea, u64 ea_mask)
+{
+	int i;
+
+	dprintk_mmu("KVM: Flushing %d Shadow PTEs: 0x%llx & 0x%llx\n",
+		    vcpu->arch.hpte_cache_offset, guest_ea, ea_mask);
+	BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
+
+	guest_ea &= ea_mask;
+	for (i = 0; i < vcpu->arch.hpte_cache_offset; i++) {
+		struct hpte_cache *pte;
+
+		pte = &vcpu->arch.hpte_cache[i];
+		if (!pte->host_va)
+			continue;
+
+		if ((pte->pte.eaddr & ea_mask) = guest_ea) {
+			invalidate_pte(pte);
+		}
+	}
+
+	/* Doing a complete flush -> start from scratch */
+	if (!ea_mask)
+		vcpu->arch.hpte_cache_offset = 0;
+}
+
+void kvmppc_mmu_pte_vflush(struct kvm_vcpu *vcpu, u64 guest_vp, u64 vp_mask)
+{
+	int i;
+
+	dprintk_mmu("KVM: Flushing %d Shadow vPTEs: 0x%llx & 0x%llx\n",
+		    vcpu->arch.hpte_cache_offset, guest_vp, vp_mask);
+	BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
+
+	guest_vp &= vp_mask;
+	for (i = 0; i < vcpu->arch.hpte_cache_offset; i++) {
+		struct hpte_cache *pte;
+
+		pte = &vcpu->arch.hpte_cache[i];
+		if (!pte->host_va)
+			continue;
+
+		if ((pte->pte.vpage & vp_mask) = guest_vp) {
+			invalidate_pte(pte);
+		}
+	}
+}
+
+void kvmppc_mmu_pte_pflush(struct kvm_vcpu *vcpu, u64 pa_start, u64 pa_end)
+{
+	int i;
+
+	dprintk_mmu("KVM: Flushing %d Shadow pPTEs: 0x%llx & 0x%llx\n",
+		    vcpu->arch.hpte_cache_offset, guest_pa, pa_mask);
+	BUG_ON(vcpu->arch.hpte_cache_offset > HPTEG_CACHE_NUM);
+
+	for (i = 0; i < vcpu->arch.hpte_cache_offset; i++) {
+		struct hpte_cache *pte;
+
+		pte = &vcpu->arch.hpte_cache[i];
+		if (!pte->host_va)
+			continue;
+
+		if ((pte->pte.raddr >= pa_start) &&
+		    (pte->pte.raddr < pa_end)) {
+			invalidate_pte(pte);
+		}
+	}
+}
+
+struct kvmppc_pte *kvmppc_mmu_find_pte(struct kvm_vcpu *vcpu, u64 ea, bool data)
+{
+	int i;
+	u64 guest_vp;
+
+	guest_vp = vcpu->arch.mmu.ea_to_vp(vcpu, ea, false);
+	for (i=0; i<vcpu->arch.hpte_cache_offset; i++) {
+		struct hpte_cache *pte;
+
+		pte = &vcpu->arch.hpte_cache[i];
+		if (!pte->host_va)
+			continue;
+
+		if (pte->pte.vpage = guest_vp)
+			return &pte->pte;
+	}
+
+	return NULL;
+}
+
+static int kvmppc_mmu_hpte_cache_next(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.hpte_cache_offset = HPTEG_CACHE_NUM)
+		kvmppc_mmu_pte_flush(vcpu, 0, 0);
+
+	return vcpu->arch.hpte_cache_offset++;
+}
+
+/* We keep 512 gvsid->hvsid entries, mapping the guest ones to the array using
+ * a hash, so we don't waste cycles on looping */
+static u16 kvmppc_sid_hash(struct kvm_vcpu *vcpu, u64 gvsid)
+{
+	return (u16)(((gvsid >> (SID_MAP_BITS * 7)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 6)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 5)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 4)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 3)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 2)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 1)) & SID_MAP_MASK) ^
+		     ((gvsid >> (SID_MAP_BITS * 0)) & SID_MAP_MASK));
+}
+
+
+static struct kvmppc_sid_map *find_sid_vsid(struct kvm_vcpu *vcpu, u64 gvsid)
+{
+	struct kvmppc_sid_map *map;
+	u16 sid_map_mask;
+
+	if (vcpu->arch.msr & MSR_PR)
+		gvsid |= VSID_PR;
+
+	sid_map_mask = kvmppc_sid_hash(vcpu, gvsid);
+	map = &to_book3s(vcpu)->sid_map[sid_map_mask];
+	if (map->guest_vsid = gvsid) {
+		dprintk_slb("SLB: Searching 0x%llx -> 0x%llx\n",
+			    gvsid, map->host_vsid);
+		return map;
+	}
+
+	map = &to_book3s(vcpu)->sid_map[SID_MAP_MASK - sid_map_mask];
+	if (map->guest_vsid = gvsid) {
+		dprintk_slb("SLB: Searching 0x%llx -> 0x%llx\n",
+			    gvsid, map->host_vsid);
+		return map;
+	}
+
+	dprintk_slb("SLB: Searching 0x%llx -> not found\n", gvsid);
+	return NULL;
+}
+
+int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte)
+{
+	pfn_t hpaddr;
+	ulong hash, hpteg, va;
+	u64 vsid;
+	int ret;
+	int rflags = 0x192;
+	int vflags = 0;
+	int attempt = 0;
+	struct kvmppc_sid_map *map;
+
+	/* Get host physical address for gpa */
+	hpaddr = gfn_to_pfn(vcpu->kvm, orig_pte->raddr >> PAGE_SHIFT);
+	if (kvm_is_error_hva(hpaddr)) {
+		printk(KERN_INFO "Couldn't get guest page for gfn %llx!\n", orig_pte->eaddr);
+		return -EINVAL;
+	}
+	hpaddr <<= PAGE_SHIFT;
+#if PAGE_SHIFT = 12
+#elif PAGE_SHIFT = 16
+	hpaddr |= orig_pte->raddr & 0xf000;
+#else
+#error Unknown page size
+#endif
+
+	/* and write the mapping ea -> hpa into the pt */
+	vcpu->arch.mmu.esid_to_vsid(vcpu, orig_pte->eaddr >> SID_SHIFT, &vsid);
+	map = find_sid_vsid(vcpu, vsid);
+	if (!map) {
+		kvmppc_mmu_map_segment(vcpu, orig_pte->eaddr);
+		map = find_sid_vsid(vcpu, vsid);
+	}
+	BUG_ON(!map);
+
+	vsid = map->host_vsid;
+	va = hpt_va(orig_pte->eaddr, vsid, MMU_SEGSIZE_256M);
+
+	if (!orig_pte->may_write)
+		rflags |= HPTE_R_PP;
+	else
+		mark_page_dirty(vcpu->kvm, orig_pte->raddr >> PAGE_SHIFT);
+
+	if (!orig_pte->may_execute)
+		rflags |= HPTE_R_N;
+
+	hash = hpt_hash(va, PTE_SIZE, MMU_SEGSIZE_256M);
+
+map_again:
+	hpteg = ((hash & htab_hash_mask) * HPTES_PER_GROUP);
+
+	/* In case we tried normal mapping already, let's nuke old entries */
+	if (attempt > 1)
+		if (ppc_md.hpte_remove(hpteg) < 0)
+			return -1;
+
+	ret = ppc_md.hpte_insert(hpteg, va, hpaddr, rflags, vflags, MMU_PAGE_4K, MMU_SEGSIZE_256M);
+
+	if (ret < 0) {
+		/* If we couldn't map a primary PTE, try a secondary */
+#ifdef USE_SECONDARY
+		hash = ~hash;
+		attempt++;
+		if (attempt % 2)
+			vflags = HPTE_V_SECONDARY;
+		else
+			vflags = 0;
+#else
+		attempt = 2;
+#endif
+		goto map_again;
+	} else {
+		int hpte_id = kvmppc_mmu_hpte_cache_next(vcpu);
+		struct hpte_cache *pte = &vcpu->arch.hpte_cache[hpte_id];
+
+		dprintk_mmu("KVM: %c%c Map 0x%llx: [%lx] 0x%lx (0x%llx) -> %lx\n",
+			    ((rflags & HPTE_R_PP) = 3) ? '-' : 'w',
+			    (rflags & HPTE_R_N) ? '-' : 'x',
+			    orig_pte->eaddr, hpteg, va, orig_pte->vpage, hpaddr);
+
+		pte->slot = hpteg + (ret & 7);
+		pte->host_va = va;
+		pte->pte = *orig_pte;
+		pte->pfn = hpaddr >> PAGE_SHIFT;
+	}
+
+	return 0;
+}
+
+static struct kvmppc_sid_map *create_sid_map(struct kvm_vcpu *vcpu, u64 gvsid)
+{
+	struct kvmppc_sid_map *map;
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	u16 sid_map_mask;
+	static int backwards_map = 0;
+
+	if (vcpu->arch.msr & MSR_PR)
+		gvsid |= VSID_PR;
+
+	/* We might get collisions that trap in preceding order, so let's
+	   map them differently */
+
+	sid_map_mask = kvmppc_sid_hash(vcpu, gvsid);
+	if (backwards_map)
+		sid_map_mask = SID_MAP_MASK - sid_map_mask;
+
+	map = &to_book3s(vcpu)->sid_map[sid_map_mask];
+
+	/* Make sure we're taking the other map next time */
+	backwards_map = !backwards_map;
+
+	/* Uh-oh ... out of mappings. Let's flush! */
+	if (vcpu_book3s->vsid_next = vcpu_book3s->vsid_max) {
+		vcpu_book3s->vsid_next = vcpu_book3s->vsid_first;
+		memset(vcpu_book3s->sid_map, 0,
+		       sizeof(struct kvmppc_sid_map) * SID_MAP_NUM);
+		kvmppc_mmu_pte_flush(vcpu, 0, 0);
+		kvmppc_mmu_flush_segments(vcpu);
+	}
+	map->host_vsid = vcpu_book3s->vsid_next++;
+
+	map->guest_vsid = gvsid;
+	map->valid = true;
+
+	return map;
+}
+
+static int kvmppc_mmu_next_segment(struct kvm_vcpu *vcpu, ulong esid)
+{
+	int i;
+	int max_slb_size = 64;
+	int found_inval = -1;
+	int r;
+
+	if (!get_paca()->kvm_slb_max)
+		get_paca()->kvm_slb_max = 1;
+
+	/* Are we overwriting? */
+	for (i = 1; i < get_paca()->kvm_slb_max; i++) {
+		if (!(get_paca()->kvm_slb[i].esid & SLB_ESID_V))
+			found_inval = i;
+		else if ((get_paca()->kvm_slb[i].esid & ESID_MASK) = esid)
+			return i;
+	}
+
+	/* Found a spare entry that was invalidated before */
+	if (found_inval > 0)
+		return found_inval;
+
+	/* No spare invalid entry, so create one */
+
+	if (mmu_slb_size < 64)
+		max_slb_size = mmu_slb_size;
+
+	/* Overflowing -> purge */
+	if ((get_paca()->kvm_slb_max) = max_slb_size)
+		kvmppc_mmu_flush_segments(vcpu);
+
+	r = get_paca()->kvm_slb_max;
+	get_paca()->kvm_slb_max++;
+
+	return r;
+}
+
+int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr)
+{
+	u64 esid = eaddr >> SID_SHIFT;
+	u64 slb_esid = (eaddr & ESID_MASK) | SLB_ESID_V;
+	u64 slb_vsid = SLB_VSID_USER;
+	u64 gvsid;
+	int slb_index;
+	struct kvmppc_sid_map *map;
+
+	slb_index = kvmppc_mmu_next_segment(vcpu, eaddr & ESID_MASK);
+
+	if (vcpu->arch.mmu.esid_to_vsid(vcpu, esid, &gvsid)) {
+		/* Invalidate an entry */
+		get_paca()->kvm_slb[slb_index].esid = 0;
+		return -ENOENT;
+	}
+
+	map = find_sid_vsid(vcpu, gvsid);
+	if (!map)
+		map = create_sid_map(vcpu, gvsid);
+
+	map->guest_esid = esid;
+
+	slb_vsid |= (map->host_vsid << 12);
+	slb_vsid &= ~SLB_VSID_KP;
+	slb_esid |= slb_index;
+
+	get_paca()->kvm_slb[slb_index].esid = slb_esid;
+	get_paca()->kvm_slb[slb_index].vsid = slb_vsid;
+
+	dprintk_slb("slbmte %#llx, %#llx\n", slb_vsid, slb_esid);
+
+	return 0;
+}
+
+void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu)
+{
+	get_paca()->kvm_slb_max = 1;
+	get_paca()->kvm_slb[0].esid = 0;
+}
+
+void kvmppc_mmu_destroy(struct kvm_vcpu *vcpu)
+{
+	kvmppc_mmu_pte_flush(vcpu, 0, 0);
+}
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 12/27] Add book3s_64 guest MMU
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47     ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

To be able to run a guest, we also need to implement a guest MMU.

This patch adds MMU handling for Book3s_64 guests.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>

---

v5 -> v6:

  - dprintk instead of scattered #ifdef's
  - 80 line limit
  - // -> /* */
---
 arch/powerpc/kvm/book3s_64_mmu.c |  476 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 476 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu.c

diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
new file mode 100644
index 0000000..a31f9c6
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -0,0 +1,476 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
+ */
+
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/highmem.h>
+
+#include <asm/tlbflush.h>
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+
+/* #define DEBUG_MMU */
+
+#ifdef DEBUG_MMU
+#define dprintk(X...) printk(KERN_INFO X)
+#else
+#define dprintk(X...) do { } while(0)
+#endif
+
+static void kvmppc_mmu_book3s_64_reset_msr(struct kvm_vcpu *vcpu)
+{
+	kvmppc_set_msr(vcpu, MSR_SF);
+}
+
+static struct kvmppc_slb *kvmppc_mmu_book3s_64_find_slbe(
+				struct kvmppc_vcpu_book3s *vcpu_book3s,
+				gva_t eaddr)
+{
+	int i;
+	u64 esid = GET_ESID(eaddr);
+	u64 esid_1t = GET_ESID_1T(eaddr);
+
+	for (i = 0; i < vcpu_book3s->slb_nr; i++) {
+		u64 cmp_esid = esid;
+
+		if (!vcpu_book3s->slb[i].valid)
+			continue;
+
+		if (vcpu_book3s->slb[i].large)
+			cmp_esid = esid_1t;
+
+		if (vcpu_book3s->slb[i].esid == cmp_esid)
+			return &vcpu_book3s->slb[i];
+	}
+
+	dprintk("KVM: No SLB entry found for 0x%lx [%llx | %llx]\n",
+		eaddr, esid, esid_1t);
+	for (i = 0; i < vcpu_book3s->slb_nr; i++) {
+	    if (vcpu_book3s->slb[i].vsid)
+		dprintk("  %d: %c%c %llx %llx\n", i,
+			vcpu_book3s->slb[i].valid ? 'v' : ' ',
+			vcpu_book3s->slb[i].large ? 'l' : ' ',
+			vcpu_book3s->slb[i].esid,
+			vcpu_book3s->slb[i].vsid);
+	}
+
+	return NULL;
+}
+
+static u64 kvmppc_mmu_book3s_64_ea_to_vp(struct kvm_vcpu *vcpu, gva_t eaddr,
+					 bool data)
+{
+	struct kvmppc_slb *slb;
+
+	slb = kvmppc_mmu_book3s_64_find_slbe(to_book3s(vcpu), eaddr);
+	if (!slb)
+		return 0;
+
+	if (slb->large)
+		return (((u64)eaddr >> 12) & 0xfffffff) |
+		       (((u64)slb->vsid) << 28);
+
+	return (((u64)eaddr >> 12) & 0xffff) | (((u64)slb->vsid) << 16);
+}
+
+static int kvmppc_mmu_book3s_64_get_pagesize(struct kvmppc_slb *slbe)
+{
+	return slbe->large ? 24 : 12;
+}
+
+static u32 kvmppc_mmu_book3s_64_get_page(struct kvmppc_slb *slbe, gva_t eaddr)
+{
+	int p = kvmppc_mmu_book3s_64_get_pagesize(slbe);
+	return ((eaddr & 0xfffffff) >> p);
+}
+
+static hva_t kvmppc_mmu_book3s_64_get_pteg(
+				struct kvmppc_vcpu_book3s *vcpu_book3s,
+				struct kvmppc_slb *slbe, gva_t eaddr,
+				bool second)
+{
+	u64 hash, pteg, htabsize;
+	u32 page;
+	hva_t r;
+
+	page = kvmppc_mmu_book3s_64_get_page(slbe, eaddr);
+	htabsize = ((1 << ((vcpu_book3s->sdr1 & 0x1f) + 11)) - 1);
+
+	hash = slbe->vsid ^ page;
+	if (second)
+		hash = ~hash;
+	hash &= ((1ULL << 39ULL) - 1ULL);
+	hash &= htabsize;
+	hash <<= 7ULL;
+
+	pteg = vcpu_book3s->sdr1 & 0xfffffffffffc0000ULL;
+	pteg |= hash;
+
+	dprintk("MMU: page=0x%x sdr1=0x%llx pteg=0x%llx vsid=0x%llx\n",
+		page, vcpu_book3s->sdr1, pteg, slbe->vsid);
+
+	r = gfn_to_hva(vcpu_book3s->vcpu.kvm, pteg >> PAGE_SHIFT);
+	if (kvm_is_error_hva(r))
+		return r;
+	return r | (pteg & ~PAGE_MASK);
+}
+
+static u64 kvmppc_mmu_book3s_64_get_avpn(struct kvmppc_slb *slbe, gva_t eaddr)
+{
+	int p = kvmppc_mmu_book3s_64_get_pagesize(slbe);
+	u64 avpn;
+
+	avpn = kvmppc_mmu_book3s_64_get_page(slbe, eaddr);
+	avpn |= slbe->vsid << (28 - p);
+
+	if (p < 24)
+		avpn >>= ((80 - p) - 56) - 8;
+	else
+		avpn <<= 8;
+
+	return avpn;
+}
+
+static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
+				struct kvmppc_pte *gpte, bool data)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_slb *slbe;
+	hva_t ptegp;
+	u64 pteg[16];
+	u64 avpn = 0;
+	int i;
+	u8 key = 0;
+	bool found = false;
+	bool perm_err = false;
+	int second = 0;
+
+	slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu_book3s, eaddr);
+	if (!slbe)
+		goto no_seg_found;
+
+do_second:
+	ptegp = kvmppc_mmu_book3s_64_get_pteg(vcpu_book3s, slbe, eaddr, second);
+	if (kvm_is_error_hva(ptegp))
+		goto no_page_found;
+
+	avpn = kvmppc_mmu_book3s_64_get_avpn(slbe, eaddr);
+
+	if(copy_from_user(pteg, (void __user *)ptegp, sizeof(pteg))) {
+		printk(KERN_ERR "KVM can't copy data from 0x%lx!\n", ptegp);
+		goto no_page_found;
+	}
+
+	if ((vcpu->arch.msr & MSR_PR) && slbe->Kp)
+		key = 4;
+	else if (!(vcpu->arch.msr & MSR_PR) && slbe->Ks)
+		key = 4;
+
+	for (i=0; i<16; i+=2) {
+		u64 v = pteg[i];
+		u64 r = pteg[i+1];
+
+		/* Valid check */
+		if (!(v & HPTE_V_VALID))
+			continue;
+		/* Hash check */
+		if ((v & HPTE_V_SECONDARY) != second)
+			continue;
+
+		/* AVPN compare */
+		if (HPTE_V_AVPN_VAL(avpn) == HPTE_V_AVPN_VAL(v)) {
+			u8 pp = (r & HPTE_R_PP) | key;
+			int eaddr_mask = 0xFFF;
+
+			gpte->eaddr = eaddr;
+			gpte->vpage = kvmppc_mmu_book3s_64_ea_to_vp(vcpu,
+								    eaddr,
+								    data);
+			if (slbe->large)
+				eaddr_mask = 0xFFFFFF;
+			gpte->raddr = (r & HPTE_R_RPN) | (eaddr & eaddr_mask);
+			gpte->may_execute = ((r & HPTE_R_N) ? false : true);
+			gpte->may_read = false;
+			gpte->may_write = false;
+
+			switch (pp) {
+			case 0:
+			case 1:
+			case 2:
+			case 6:
+				gpte->may_write = true;
+				/* fall through */
+			case 3:
+			case 5:
+			case 7:
+				gpte->may_read = true;
+				break;
+			}
+
+			if (!gpte->may_read) {
+				perm_err = true;
+				continue;
+			}
+
+			dprintk("KVM MMU: Translated 0x%lx [0x%llx] -> 0x%llx "
+				"-> 0x%llx\n",
+				eaddr, avpn, gpte->vpage, gpte->raddr);
+			found = true;
+			break;
+		}
+	}
+
+	/* Update PTE R and C bits, so the guest's swapper knows we used the
+	 * page */
+	if (found) {
+		u32 oldr = pteg[i+1];
+
+		if (gpte->may_read) {
+			/* Set the accessed flag */
+			pteg[i+1] |= HPTE_R_R;
+		}
+		if (gpte->may_write) {
+			/* Set the dirty flag */
+			pteg[i+1] |= HPTE_R_C;
+		} else {
+			dprintk("KVM: Mapping read-only page!\n");
+		}
+
+		/* Write back into the PTEG */
+		if (pteg[i+1] != oldr)
+			copy_to_user((void __user *)ptegp, pteg, sizeof(pteg));
+
+		return 0;
+	} else {
+		dprintk("KVM MMU: No PTE found (ea=0x%lx sdr1=0x%llx "
+			"ptegp=0x%lx)\n",
+			eaddr, to_book3s(vcpu)->sdr1, ptegp);
+		for (i = 0; i < 16; i += 2)
+			dprintk("   %02d: 0x%llx - 0x%llx (0x%llx)\n",
+				i, pteg[i], pteg[i+1], avpn);
+
+		if (!second) {
+			second = HPTE_V_SECONDARY;
+			goto do_second;
+		}
+	}
+
+
+no_page_found:
+
+
+	if (perm_err)
+		return -EPERM;
+
+	return -ENOENT;
+
+no_seg_found:
+
+	dprintk("KVM MMU: Trigger segment fault\n");
+	return -EINVAL;
+}
+
+static void kvmppc_mmu_book3s_64_slbmte(struct kvm_vcpu *vcpu, u64 rs, u64 rb)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s;
+	u64 esid, esid_1t;
+	int slb_nr;
+	struct kvmppc_slb *slbe;
+
+	dprintk("KVM MMU: slbmte(0x%llx, 0x%llx)\n", rs, rb);
+
+	vcpu_book3s = to_book3s(vcpu);
+
+	esid = GET_ESID(rb);
+	esid_1t = GET_ESID_1T(rb);
+	slb_nr = rb & 0xfff;
+
+	if (slb_nr > vcpu_book3s->slb_nr)
+		return;
+
+	slbe = &vcpu_book3s->slb[slb_nr];
+
+	slbe->large = (rs & SLB_VSID_L) ? 1 : 0;
+	slbe->esid  = slbe->large ? esid_1t : esid;
+	slbe->vsid  = rs >> 12;
+	slbe->valid = (rb & SLB_ESID_V) ? 1 : 0;
+	slbe->Ks    = (rs & SLB_VSID_KS) ? 1 : 0;
+	slbe->Kp    = (rs & SLB_VSID_KP) ? 1 : 0;
+	slbe->nx    = (rs & SLB_VSID_N) ? 1 : 0;
+	slbe->class = (rs & SLB_VSID_C) ? 1 : 0;
+
+	slbe->orige = rb & (ESID_MASK | SLB_ESID_V);
+	slbe->origv = rs;
+
+	/* Map the new segment */
+	kvmppc_mmu_map_segment(vcpu, esid << SID_SHIFT);
+}
+
+static u64 kvmppc_mmu_book3s_64_slbmfee(struct kvm_vcpu *vcpu, u64 slb_nr)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_slb *slbe;
+
+	if (slb_nr > vcpu_book3s->slb_nr)
+		return 0;
+
+	slbe = &vcpu_book3s->slb[slb_nr];
+
+	return slbe->orige;
+}
+
+static u64 kvmppc_mmu_book3s_64_slbmfev(struct kvm_vcpu *vcpu, u64 slb_nr)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_slb *slbe;
+
+	if (slb_nr > vcpu_book3s->slb_nr)
+		return 0;
+
+	slbe = &vcpu_book3s->slb[slb_nr];
+
+	return slbe->origv;
+}
+
+static void kvmppc_mmu_book3s_64_slbie(struct kvm_vcpu *vcpu, u64 ea)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_slb *slbe;
+
+	dprintk("KVM MMU: slbie(0x%llx)\n", ea);
+
+	slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu_book3s, ea);
+
+	if (!slbe)
+		return;
+
+	dprintk("KVM MMU: slbie(0x%llx, 0x%llx)\n", ea, slbe->esid);
+
+	slbe->valid = false;
+
+	kvmppc_mmu_map_segment(vcpu, ea);
+}
+
+static void kvmppc_mmu_book3s_64_slbia(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	int i;
+
+	dprintk("KVM MMU: slbia()\n");
+
+	for (i = 1; i < vcpu_book3s->slb_nr; i++)
+		vcpu_book3s->slb[i].valid = false;
+
+	if (vcpu->arch.msr & MSR_IR) {
+		kvmppc_mmu_flush_segments(vcpu);
+		kvmppc_mmu_map_segment(vcpu, vcpu->arch.pc);
+	}
+}
+
+static void kvmppc_mmu_book3s_64_mtsrin(struct kvm_vcpu *vcpu, u32 srnum,
+					ulong value)
+{
+	u64 rb = 0, rs = 0;
+
+	/* ESID = srnum */
+	rb |= (srnum & 0xf) << 28;
+	/* Set the valid bit */
+	rb |= 1 << 27;
+	/* Index = ESID */
+	rb |= srnum;
+
+	/* VSID = VSID */
+	rs |= (value & 0xfffffff) << 12;
+	/* flags = flags */
+	rs |= ((value >> 27) & 0xf) << 9;
+
+	kvmppc_mmu_book3s_64_slbmte(vcpu, rs, rb);
+}
+
+static void kvmppc_mmu_book3s_64_tlbie(struct kvm_vcpu *vcpu, ulong va,
+				       bool large)
+{
+	u64 mask = 0xFFFFFFFFFULL;
+
+	dprintk("KVM MMU: tlbie(0x%lx)\n", va);
+
+	if (large)
+		mask = 0xFFFFFF000ULL;
+	kvmppc_mmu_pte_vflush(vcpu, va >> 12, mask);
+}
+
+static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, u64 esid,
+					     u64 *vsid)
+{
+	switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+	case 0:
+		*vsid = (VSID_REAL >> 16) | esid;
+		break;
+	case MSR_IR:
+		*vsid = (VSID_REAL_IR >> 16) | esid;
+		break;
+	case MSR_DR:
+		*vsid = (VSID_REAL_DR >> 16) | esid;
+		break;
+	case MSR_DR|MSR_IR:
+	{
+		ulong ea;
+		struct kvmppc_slb *slb;
+		ea = esid << SID_SHIFT;
+		slb = kvmppc_mmu_book3s_64_find_slbe(to_book3s(vcpu), ea);
+		if (slb)
+			*vsid = slb->vsid;
+		else
+			return -ENOENT;
+
+		break;
+	}
+	default:
+		BUG();
+		break;
+	}
+
+	return 0;
+}
+
+static bool kvmppc_mmu_book3s_64_is_dcbz32(struct kvm_vcpu *vcpu)
+{
+	return (to_book3s(vcpu)->hid[5] & 0x80);
+}
+
+void kvmppc_mmu_book3s_64_init(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_mmu *mmu = &vcpu->arch.mmu;
+
+	mmu->mfsrin = NULL;
+	mmu->mtsrin = kvmppc_mmu_book3s_64_mtsrin;
+	mmu->slbmte = kvmppc_mmu_book3s_64_slbmte;
+	mmu->slbmfee = kvmppc_mmu_book3s_64_slbmfee;
+	mmu->slbmfev = kvmppc_mmu_book3s_64_slbmfev;
+	mmu->slbie = kvmppc_mmu_book3s_64_slbie;
+	mmu->slbia = kvmppc_mmu_book3s_64_slbia;
+	mmu->xlate = kvmppc_mmu_book3s_64_xlate;
+	mmu->reset_msr = kvmppc_mmu_book3s_64_reset_msr;
+	mmu->tlbie = kvmppc_mmu_book3s_64_tlbie;
+	mmu->esid_to_vsid = kvmppc_mmu_book3s_64_esid_to_vsid;
+	mmu->ea_to_vp = kvmppc_mmu_book3s_64_ea_to_vp;
+	mmu->is_dcbz32 = kvmppc_mmu_book3s_64_is_dcbz32;
+}
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 12/27] Add book3s_64 guest MMU
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

To be able to run a guest, we also need to implement a guest MMU.

This patch adds MMU handling for Book3s_64 guests.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v5 -> v6:

  - dprintk instead of scattered #ifdef's
  - 80 line limit
  - // -> /* */
---
 arch/powerpc/kvm/book3s_64_mmu.c |  476 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 476 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu.c

diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
new file mode 100644
index 0000000..a31f9c6
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -0,0 +1,476 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/highmem.h>
+
+#include <asm/tlbflush.h>
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+
+/* #define DEBUG_MMU */
+
+#ifdef DEBUG_MMU
+#define dprintk(X...) printk(KERN_INFO X)
+#else
+#define dprintk(X...) do { } while(0)
+#endif
+
+static void kvmppc_mmu_book3s_64_reset_msr(struct kvm_vcpu *vcpu)
+{
+	kvmppc_set_msr(vcpu, MSR_SF);
+}
+
+static struct kvmppc_slb *kvmppc_mmu_book3s_64_find_slbe(
+				struct kvmppc_vcpu_book3s *vcpu_book3s,
+				gva_t eaddr)
+{
+	int i;
+	u64 esid = GET_ESID(eaddr);
+	u64 esid_1t = GET_ESID_1T(eaddr);
+
+	for (i = 0; i < vcpu_book3s->slb_nr; i++) {
+		u64 cmp_esid = esid;
+
+		if (!vcpu_book3s->slb[i].valid)
+			continue;
+
+		if (vcpu_book3s->slb[i].large)
+			cmp_esid = esid_1t;
+
+		if (vcpu_book3s->slb[i].esid == cmp_esid)
+			return &vcpu_book3s->slb[i];
+	}
+
+	dprintk("KVM: No SLB entry found for 0x%lx [%llx | %llx]\n",
+		eaddr, esid, esid_1t);
+	for (i = 0; i < vcpu_book3s->slb_nr; i++) {
+	    if (vcpu_book3s->slb[i].vsid)
+		dprintk("  %d: %c%c %llx %llx\n", i,
+			vcpu_book3s->slb[i].valid ? 'v' : ' ',
+			vcpu_book3s->slb[i].large ? 'l' : ' ',
+			vcpu_book3s->slb[i].esid,
+			vcpu_book3s->slb[i].vsid);
+	}
+
+	return NULL;
+}
+
+static u64 kvmppc_mmu_book3s_64_ea_to_vp(struct kvm_vcpu *vcpu, gva_t eaddr,
+					 bool data)
+{
+	struct kvmppc_slb *slb;
+
+	slb = kvmppc_mmu_book3s_64_find_slbe(to_book3s(vcpu), eaddr);
+	if (!slb)
+		return 0;
+
+	if (slb->large)
+		return (((u64)eaddr >> 12) & 0xfffffff) |
+		       (((u64)slb->vsid) << 28);
+
+	return (((u64)eaddr >> 12) & 0xffff) | (((u64)slb->vsid) << 16);
+}
+
+static int kvmppc_mmu_book3s_64_get_pagesize(struct kvmppc_slb *slbe)
+{
+	return slbe->large ? 24 : 12;
+}
+
+static u32 kvmppc_mmu_book3s_64_get_page(struct kvmppc_slb *slbe, gva_t eaddr)
+{
+	int p = kvmppc_mmu_book3s_64_get_pagesize(slbe);
+	return ((eaddr & 0xfffffff) >> p);
+}
+
+static hva_t kvmppc_mmu_book3s_64_get_pteg(
+				struct kvmppc_vcpu_book3s *vcpu_book3s,
+				struct kvmppc_slb *slbe, gva_t eaddr,
+				bool second)
+{
+	u64 hash, pteg, htabsize;
+	u32 page;
+	hva_t r;
+
+	page = kvmppc_mmu_book3s_64_get_page(slbe, eaddr);
+	htabsize = ((1 << ((vcpu_book3s->sdr1 & 0x1f) + 11)) - 1);
+
+	hash = slbe->vsid ^ page;
+	if (second)
+		hash = ~hash;
+	hash &= ((1ULL << 39ULL) - 1ULL);
+	hash &= htabsize;
+	hash <<= 7ULL;
+
+	pteg = vcpu_book3s->sdr1 & 0xfffffffffffc0000ULL;
+	pteg |= hash;
+
+	dprintk("MMU: page=0x%x sdr1=0x%llx pteg=0x%llx vsid=0x%llx\n",
+		page, vcpu_book3s->sdr1, pteg, slbe->vsid);
+
+	r = gfn_to_hva(vcpu_book3s->vcpu.kvm, pteg >> PAGE_SHIFT);
+	if (kvm_is_error_hva(r))
+		return r;
+	return r | (pteg & ~PAGE_MASK);
+}
+
+static u64 kvmppc_mmu_book3s_64_get_avpn(struct kvmppc_slb *slbe, gva_t eaddr)
+{
+	int p = kvmppc_mmu_book3s_64_get_pagesize(slbe);
+	u64 avpn;
+
+	avpn = kvmppc_mmu_book3s_64_get_page(slbe, eaddr);
+	avpn |= slbe->vsid << (28 - p);
+
+	if (p < 24)
+		avpn >>= ((80 - p) - 56) - 8;
+	else
+		avpn <<= 8;
+
+	return avpn;
+}
+
+static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
+				struct kvmppc_pte *gpte, bool data)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_slb *slbe;
+	hva_t ptegp;
+	u64 pteg[16];
+	u64 avpn = 0;
+	int i;
+	u8 key = 0;
+	bool found = false;
+	bool perm_err = false;
+	int second = 0;
+
+	slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu_book3s, eaddr);
+	if (!slbe)
+		goto no_seg_found;
+
+do_second:
+	ptegp = kvmppc_mmu_book3s_64_get_pteg(vcpu_book3s, slbe, eaddr, second);
+	if (kvm_is_error_hva(ptegp))
+		goto no_page_found;
+
+	avpn = kvmppc_mmu_book3s_64_get_avpn(slbe, eaddr);
+
+	if(copy_from_user(pteg, (void __user *)ptegp, sizeof(pteg))) {
+		printk(KERN_ERR "KVM can't copy data from 0x%lx!\n", ptegp);
+		goto no_page_found;
+	}
+
+	if ((vcpu->arch.msr & MSR_PR) && slbe->Kp)
+		key = 4;
+	else if (!(vcpu->arch.msr & MSR_PR) && slbe->Ks)
+		key = 4;
+
+	for (i=0; i<16; i+=2) {
+		u64 v = pteg[i];
+		u64 r = pteg[i+1];
+
+		/* Valid check */
+		if (!(v & HPTE_V_VALID))
+			continue;
+		/* Hash check */
+		if ((v & HPTE_V_SECONDARY) != second)
+			continue;
+
+		/* AVPN compare */
+		if (HPTE_V_AVPN_VAL(avpn) == HPTE_V_AVPN_VAL(v)) {
+			u8 pp = (r & HPTE_R_PP) | key;
+			int eaddr_mask = 0xFFF;
+
+			gpte->eaddr = eaddr;
+			gpte->vpage = kvmppc_mmu_book3s_64_ea_to_vp(vcpu,
+								    eaddr,
+								    data);
+			if (slbe->large)
+				eaddr_mask = 0xFFFFFF;
+			gpte->raddr = (r & HPTE_R_RPN) | (eaddr & eaddr_mask);
+			gpte->may_execute = ((r & HPTE_R_N) ? false : true);
+			gpte->may_read = false;
+			gpte->may_write = false;
+
+			switch (pp) {
+			case 0:
+			case 1:
+			case 2:
+			case 6:
+				gpte->may_write = true;
+				/* fall through */
+			case 3:
+			case 5:
+			case 7:
+				gpte->may_read = true;
+				break;
+			}
+
+			if (!gpte->may_read) {
+				perm_err = true;
+				continue;
+			}
+
+			dprintk("KVM MMU: Translated 0x%lx [0x%llx] -> 0x%llx "
+				"-> 0x%llx\n",
+				eaddr, avpn, gpte->vpage, gpte->raddr);
+			found = true;
+			break;
+		}
+	}
+
+	/* Update PTE R and C bits, so the guest's swapper knows we used the
+	 * page */
+	if (found) {
+		u32 oldr = pteg[i+1];
+
+		if (gpte->may_read) {
+			/* Set the accessed flag */
+			pteg[i+1] |= HPTE_R_R;
+		}
+		if (gpte->may_write) {
+			/* Set the dirty flag */
+			pteg[i+1] |= HPTE_R_C;
+		} else {
+			dprintk("KVM: Mapping read-only page!\n");
+		}
+
+		/* Write back into the PTEG */
+		if (pteg[i+1] != oldr)
+			copy_to_user((void __user *)ptegp, pteg, sizeof(pteg));
+
+		return 0;
+	} else {
+		dprintk("KVM MMU: No PTE found (ea=0x%lx sdr1=0x%llx "
+			"ptegp=0x%lx)\n",
+			eaddr, to_book3s(vcpu)->sdr1, ptegp);
+		for (i = 0; i < 16; i += 2)
+			dprintk("   %02d: 0x%llx - 0x%llx (0x%llx)\n",
+				i, pteg[i], pteg[i+1], avpn);
+
+		if (!second) {
+			second = HPTE_V_SECONDARY;
+			goto do_second;
+		}
+	}
+
+
+no_page_found:
+
+
+	if (perm_err)
+		return -EPERM;
+
+	return -ENOENT;
+
+no_seg_found:
+
+	dprintk("KVM MMU: Trigger segment fault\n");
+	return -EINVAL;
+}
+
+static void kvmppc_mmu_book3s_64_slbmte(struct kvm_vcpu *vcpu, u64 rs, u64 rb)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s;
+	u64 esid, esid_1t;
+	int slb_nr;
+	struct kvmppc_slb *slbe;
+
+	dprintk("KVM MMU: slbmte(0x%llx, 0x%llx)\n", rs, rb);
+
+	vcpu_book3s = to_book3s(vcpu);
+
+	esid = GET_ESID(rb);
+	esid_1t = GET_ESID_1T(rb);
+	slb_nr = rb & 0xfff;
+
+	if (slb_nr > vcpu_book3s->slb_nr)
+		return;
+
+	slbe = &vcpu_book3s->slb[slb_nr];
+
+	slbe->large = (rs & SLB_VSID_L) ? 1 : 0;
+	slbe->esid  = slbe->large ? esid_1t : esid;
+	slbe->vsid  = rs >> 12;
+	slbe->valid = (rb & SLB_ESID_V) ? 1 : 0;
+	slbe->Ks    = (rs & SLB_VSID_KS) ? 1 : 0;
+	slbe->Kp    = (rs & SLB_VSID_KP) ? 1 : 0;
+	slbe->nx    = (rs & SLB_VSID_N) ? 1 : 0;
+	slbe->class = (rs & SLB_VSID_C) ? 1 : 0;
+
+	slbe->orige = rb & (ESID_MASK | SLB_ESID_V);
+	slbe->origv = rs;
+
+	/* Map the new segment */
+	kvmppc_mmu_map_segment(vcpu, esid << SID_SHIFT);
+}
+
+static u64 kvmppc_mmu_book3s_64_slbmfee(struct kvm_vcpu *vcpu, u64 slb_nr)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_slb *slbe;
+
+	if (slb_nr > vcpu_book3s->slb_nr)
+		return 0;
+
+	slbe = &vcpu_book3s->slb[slb_nr];
+
+	return slbe->orige;
+}
+
+static u64 kvmppc_mmu_book3s_64_slbmfev(struct kvm_vcpu *vcpu, u64 slb_nr)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_slb *slbe;
+
+	if (slb_nr > vcpu_book3s->slb_nr)
+		return 0;
+
+	slbe = &vcpu_book3s->slb[slb_nr];
+
+	return slbe->origv;
+}
+
+static void kvmppc_mmu_book3s_64_slbie(struct kvm_vcpu *vcpu, u64 ea)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_slb *slbe;
+
+	dprintk("KVM MMU: slbie(0x%llx)\n", ea);
+
+	slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu_book3s, ea);
+
+	if (!slbe)
+		return;
+
+	dprintk("KVM MMU: slbie(0x%llx, 0x%llx)\n", ea, slbe->esid);
+
+	slbe->valid = false;
+
+	kvmppc_mmu_map_segment(vcpu, ea);
+}
+
+static void kvmppc_mmu_book3s_64_slbia(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	int i;
+
+	dprintk("KVM MMU: slbia()\n");
+
+	for (i = 1; i < vcpu_book3s->slb_nr; i++)
+		vcpu_book3s->slb[i].valid = false;
+
+	if (vcpu->arch.msr & MSR_IR) {
+		kvmppc_mmu_flush_segments(vcpu);
+		kvmppc_mmu_map_segment(vcpu, vcpu->arch.pc);
+	}
+}
+
+static void kvmppc_mmu_book3s_64_mtsrin(struct kvm_vcpu *vcpu, u32 srnum,
+					ulong value)
+{
+	u64 rb = 0, rs = 0;
+
+	/* ESID = srnum */
+	rb |= (srnum & 0xf) << 28;
+	/* Set the valid bit */
+	rb |= 1 << 27;
+	/* Index = ESID */
+	rb |= srnum;
+
+	/* VSID = VSID */
+	rs |= (value & 0xfffffff) << 12;
+	/* flags = flags */
+	rs |= ((value >> 27) & 0xf) << 9;
+
+	kvmppc_mmu_book3s_64_slbmte(vcpu, rs, rb);
+}
+
+static void kvmppc_mmu_book3s_64_tlbie(struct kvm_vcpu *vcpu, ulong va,
+				       bool large)
+{
+	u64 mask = 0xFFFFFFFFFULL;
+
+	dprintk("KVM MMU: tlbie(0x%lx)\n", va);
+
+	if (large)
+		mask = 0xFFFFFF000ULL;
+	kvmppc_mmu_pte_vflush(vcpu, va >> 12, mask);
+}
+
+static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, u64 esid,
+					     u64 *vsid)
+{
+	switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+	case 0:
+		*vsid = (VSID_REAL >> 16) | esid;
+		break;
+	case MSR_IR:
+		*vsid = (VSID_REAL_IR >> 16) | esid;
+		break;
+	case MSR_DR:
+		*vsid = (VSID_REAL_DR >> 16) | esid;
+		break;
+	case MSR_DR|MSR_IR:
+	{
+		ulong ea;
+		struct kvmppc_slb *slb;
+		ea = esid << SID_SHIFT;
+		slb = kvmppc_mmu_book3s_64_find_slbe(to_book3s(vcpu), ea);
+		if (slb)
+			*vsid = slb->vsid;
+		else
+			return -ENOENT;
+
+		break;
+	}
+	default:
+		BUG();
+		break;
+	}
+
+	return 0;
+}
+
+static bool kvmppc_mmu_book3s_64_is_dcbz32(struct kvm_vcpu *vcpu)
+{
+	return (to_book3s(vcpu)->hid[5] & 0x80);
+}
+
+void kvmppc_mmu_book3s_64_init(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_mmu *mmu = &vcpu->arch.mmu;
+
+	mmu->mfsrin = NULL;
+	mmu->mtsrin = kvmppc_mmu_book3s_64_mtsrin;
+	mmu->slbmte = kvmppc_mmu_book3s_64_slbmte;
+	mmu->slbmfee = kvmppc_mmu_book3s_64_slbmfee;
+	mmu->slbmfev = kvmppc_mmu_book3s_64_slbmfev;
+	mmu->slbie = kvmppc_mmu_book3s_64_slbie;
+	mmu->slbia = kvmppc_mmu_book3s_64_slbia;
+	mmu->xlate = kvmppc_mmu_book3s_64_xlate;
+	mmu->reset_msr = kvmppc_mmu_book3s_64_reset_msr;
+	mmu->tlbie = kvmppc_mmu_book3s_64_tlbie;
+	mmu->esid_to_vsid = kvmppc_mmu_book3s_64_esid_to_vsid;
+	mmu->ea_to_vp = kvmppc_mmu_book3s_64_ea_to_vp;
+	mmu->is_dcbz32 = kvmppc_mmu_book3s_64_is_dcbz32;
+}
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 12/27] Add book3s_64 guest MMU
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

To be able to run a guest, we also need to implement a guest MMU.

This patch adds MMU handling for Book3s_64 guests.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v5 -> v6:

  - dprintk instead of scattered #ifdef's
  - 80 line limit
  - // -> /* */
---
 arch/powerpc/kvm/book3s_64_mmu.c |  476 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 476 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_mmu.c

diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
new file mode 100644
index 0000000..a31f9c6
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -0,0 +1,476 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/highmem.h>
+
+#include <asm/tlbflush.h>
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+
+/* #define DEBUG_MMU */
+
+#ifdef DEBUG_MMU
+#define dprintk(X...) printk(KERN_INFO X)
+#else
+#define dprintk(X...) do { } while(0)
+#endif
+
+static void kvmppc_mmu_book3s_64_reset_msr(struct kvm_vcpu *vcpu)
+{
+	kvmppc_set_msr(vcpu, MSR_SF);
+}
+
+static struct kvmppc_slb *kvmppc_mmu_book3s_64_find_slbe(
+				struct kvmppc_vcpu_book3s *vcpu_book3s,
+				gva_t eaddr)
+{
+	int i;
+	u64 esid = GET_ESID(eaddr);
+	u64 esid_1t = GET_ESID_1T(eaddr);
+
+	for (i = 0; i < vcpu_book3s->slb_nr; i++) {
+		u64 cmp_esid = esid;
+
+		if (!vcpu_book3s->slb[i].valid)
+			continue;
+
+		if (vcpu_book3s->slb[i].large)
+			cmp_esid = esid_1t;
+
+		if (vcpu_book3s->slb[i].esid = cmp_esid)
+			return &vcpu_book3s->slb[i];
+	}
+
+	dprintk("KVM: No SLB entry found for 0x%lx [%llx | %llx]\n",
+		eaddr, esid, esid_1t);
+	for (i = 0; i < vcpu_book3s->slb_nr; i++) {
+	    if (vcpu_book3s->slb[i].vsid)
+		dprintk("  %d: %c%c %llx %llx\n", i,
+			vcpu_book3s->slb[i].valid ? 'v' : ' ',
+			vcpu_book3s->slb[i].large ? 'l' : ' ',
+			vcpu_book3s->slb[i].esid,
+			vcpu_book3s->slb[i].vsid);
+	}
+
+	return NULL;
+}
+
+static u64 kvmppc_mmu_book3s_64_ea_to_vp(struct kvm_vcpu *vcpu, gva_t eaddr,
+					 bool data)
+{
+	struct kvmppc_slb *slb;
+
+	slb = kvmppc_mmu_book3s_64_find_slbe(to_book3s(vcpu), eaddr);
+	if (!slb)
+		return 0;
+
+	if (slb->large)
+		return (((u64)eaddr >> 12) & 0xfffffff) |
+		       (((u64)slb->vsid) << 28);
+
+	return (((u64)eaddr >> 12) & 0xffff) | (((u64)slb->vsid) << 16);
+}
+
+static int kvmppc_mmu_book3s_64_get_pagesize(struct kvmppc_slb *slbe)
+{
+	return slbe->large ? 24 : 12;
+}
+
+static u32 kvmppc_mmu_book3s_64_get_page(struct kvmppc_slb *slbe, gva_t eaddr)
+{
+	int p = kvmppc_mmu_book3s_64_get_pagesize(slbe);
+	return ((eaddr & 0xfffffff) >> p);
+}
+
+static hva_t kvmppc_mmu_book3s_64_get_pteg(
+				struct kvmppc_vcpu_book3s *vcpu_book3s,
+				struct kvmppc_slb *slbe, gva_t eaddr,
+				bool second)
+{
+	u64 hash, pteg, htabsize;
+	u32 page;
+	hva_t r;
+
+	page = kvmppc_mmu_book3s_64_get_page(slbe, eaddr);
+	htabsize = ((1 << ((vcpu_book3s->sdr1 & 0x1f) + 11)) - 1);
+
+	hash = slbe->vsid ^ page;
+	if (second)
+		hash = ~hash;
+	hash &= ((1ULL << 39ULL) - 1ULL);
+	hash &= htabsize;
+	hash <<= 7ULL;
+
+	pteg = vcpu_book3s->sdr1 & 0xfffffffffffc0000ULL;
+	pteg |= hash;
+
+	dprintk("MMU: page=0x%x sdr1=0x%llx pteg=0x%llx vsid=0x%llx\n",
+		page, vcpu_book3s->sdr1, pteg, slbe->vsid);
+
+	r = gfn_to_hva(vcpu_book3s->vcpu.kvm, pteg >> PAGE_SHIFT);
+	if (kvm_is_error_hva(r))
+		return r;
+	return r | (pteg & ~PAGE_MASK);
+}
+
+static u64 kvmppc_mmu_book3s_64_get_avpn(struct kvmppc_slb *slbe, gva_t eaddr)
+{
+	int p = kvmppc_mmu_book3s_64_get_pagesize(slbe);
+	u64 avpn;
+
+	avpn = kvmppc_mmu_book3s_64_get_page(slbe, eaddr);
+	avpn |= slbe->vsid << (28 - p);
+
+	if (p < 24)
+		avpn >>= ((80 - p) - 56) - 8;
+	else
+		avpn <<= 8;
+
+	return avpn;
+}
+
+static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
+				struct kvmppc_pte *gpte, bool data)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_slb *slbe;
+	hva_t ptegp;
+	u64 pteg[16];
+	u64 avpn = 0;
+	int i;
+	u8 key = 0;
+	bool found = false;
+	bool perm_err = false;
+	int second = 0;
+
+	slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu_book3s, eaddr);
+	if (!slbe)
+		goto no_seg_found;
+
+do_second:
+	ptegp = kvmppc_mmu_book3s_64_get_pteg(vcpu_book3s, slbe, eaddr, second);
+	if (kvm_is_error_hva(ptegp))
+		goto no_page_found;
+
+	avpn = kvmppc_mmu_book3s_64_get_avpn(slbe, eaddr);
+
+	if(copy_from_user(pteg, (void __user *)ptegp, sizeof(pteg))) {
+		printk(KERN_ERR "KVM can't copy data from 0x%lx!\n", ptegp);
+		goto no_page_found;
+	}
+
+	if ((vcpu->arch.msr & MSR_PR) && slbe->Kp)
+		key = 4;
+	else if (!(vcpu->arch.msr & MSR_PR) && slbe->Ks)
+		key = 4;
+
+	for (i=0; i<16; i+=2) {
+		u64 v = pteg[i];
+		u64 r = pteg[i+1];
+
+		/* Valid check */
+		if (!(v & HPTE_V_VALID))
+			continue;
+		/* Hash check */
+		if ((v & HPTE_V_SECONDARY) != second)
+			continue;
+
+		/* AVPN compare */
+		if (HPTE_V_AVPN_VAL(avpn) = HPTE_V_AVPN_VAL(v)) {
+			u8 pp = (r & HPTE_R_PP) | key;
+			int eaddr_mask = 0xFFF;
+
+			gpte->eaddr = eaddr;
+			gpte->vpage = kvmppc_mmu_book3s_64_ea_to_vp(vcpu,
+								    eaddr,
+								    data);
+			if (slbe->large)
+				eaddr_mask = 0xFFFFFF;
+			gpte->raddr = (r & HPTE_R_RPN) | (eaddr & eaddr_mask);
+			gpte->may_execute = ((r & HPTE_R_N) ? false : true);
+			gpte->may_read = false;
+			gpte->may_write = false;
+
+			switch (pp) {
+			case 0:
+			case 1:
+			case 2:
+			case 6:
+				gpte->may_write = true;
+				/* fall through */
+			case 3:
+			case 5:
+			case 7:
+				gpte->may_read = true;
+				break;
+			}
+
+			if (!gpte->may_read) {
+				perm_err = true;
+				continue;
+			}
+
+			dprintk("KVM MMU: Translated 0x%lx [0x%llx] -> 0x%llx "
+				"-> 0x%llx\n",
+				eaddr, avpn, gpte->vpage, gpte->raddr);
+			found = true;
+			break;
+		}
+	}
+
+	/* Update PTE R and C bits, so the guest's swapper knows we used the
+	 * page */
+	if (found) {
+		u32 oldr = pteg[i+1];
+
+		if (gpte->may_read) {
+			/* Set the accessed flag */
+			pteg[i+1] |= HPTE_R_R;
+		}
+		if (gpte->may_write) {
+			/* Set the dirty flag */
+			pteg[i+1] |= HPTE_R_C;
+		} else {
+			dprintk("KVM: Mapping read-only page!\n");
+		}
+
+		/* Write back into the PTEG */
+		if (pteg[i+1] != oldr)
+			copy_to_user((void __user *)ptegp, pteg, sizeof(pteg));
+
+		return 0;
+	} else {
+		dprintk("KVM MMU: No PTE found (ea=0x%lx sdr1=0x%llx "
+			"ptegp=0x%lx)\n",
+			eaddr, to_book3s(vcpu)->sdr1, ptegp);
+		for (i = 0; i < 16; i += 2)
+			dprintk("   %02d: 0x%llx - 0x%llx (0x%llx)\n",
+				i, pteg[i], pteg[i+1], avpn);
+
+		if (!second) {
+			second = HPTE_V_SECONDARY;
+			goto do_second;
+		}
+	}
+
+
+no_page_found:
+
+
+	if (perm_err)
+		return -EPERM;
+
+	return -ENOENT;
+
+no_seg_found:
+
+	dprintk("KVM MMU: Trigger segment fault\n");
+	return -EINVAL;
+}
+
+static void kvmppc_mmu_book3s_64_slbmte(struct kvm_vcpu *vcpu, u64 rs, u64 rb)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s;
+	u64 esid, esid_1t;
+	int slb_nr;
+	struct kvmppc_slb *slbe;
+
+	dprintk("KVM MMU: slbmte(0x%llx, 0x%llx)\n", rs, rb);
+
+	vcpu_book3s = to_book3s(vcpu);
+
+	esid = GET_ESID(rb);
+	esid_1t = GET_ESID_1T(rb);
+	slb_nr = rb & 0xfff;
+
+	if (slb_nr > vcpu_book3s->slb_nr)
+		return;
+
+	slbe = &vcpu_book3s->slb[slb_nr];
+
+	slbe->large = (rs & SLB_VSID_L) ? 1 : 0;
+	slbe->esid  = slbe->large ? esid_1t : esid;
+	slbe->vsid  = rs >> 12;
+	slbe->valid = (rb & SLB_ESID_V) ? 1 : 0;
+	slbe->Ks    = (rs & SLB_VSID_KS) ? 1 : 0;
+	slbe->Kp    = (rs & SLB_VSID_KP) ? 1 : 0;
+	slbe->nx    = (rs & SLB_VSID_N) ? 1 : 0;
+	slbe->class = (rs & SLB_VSID_C) ? 1 : 0;
+
+	slbe->orige = rb & (ESID_MASK | SLB_ESID_V);
+	slbe->origv = rs;
+
+	/* Map the new segment */
+	kvmppc_mmu_map_segment(vcpu, esid << SID_SHIFT);
+}
+
+static u64 kvmppc_mmu_book3s_64_slbmfee(struct kvm_vcpu *vcpu, u64 slb_nr)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_slb *slbe;
+
+	if (slb_nr > vcpu_book3s->slb_nr)
+		return 0;
+
+	slbe = &vcpu_book3s->slb[slb_nr];
+
+	return slbe->orige;
+}
+
+static u64 kvmppc_mmu_book3s_64_slbmfev(struct kvm_vcpu *vcpu, u64 slb_nr)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_slb *slbe;
+
+	if (slb_nr > vcpu_book3s->slb_nr)
+		return 0;
+
+	slbe = &vcpu_book3s->slb[slb_nr];
+
+	return slbe->origv;
+}
+
+static void kvmppc_mmu_book3s_64_slbie(struct kvm_vcpu *vcpu, u64 ea)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_slb *slbe;
+
+	dprintk("KVM MMU: slbie(0x%llx)\n", ea);
+
+	slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu_book3s, ea);
+
+	if (!slbe)
+		return;
+
+	dprintk("KVM MMU: slbie(0x%llx, 0x%llx)\n", ea, slbe->esid);
+
+	slbe->valid = false;
+
+	kvmppc_mmu_map_segment(vcpu, ea);
+}
+
+static void kvmppc_mmu_book3s_64_slbia(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	int i;
+
+	dprintk("KVM MMU: slbia()\n");
+
+	for (i = 1; i < vcpu_book3s->slb_nr; i++)
+		vcpu_book3s->slb[i].valid = false;
+
+	if (vcpu->arch.msr & MSR_IR) {
+		kvmppc_mmu_flush_segments(vcpu);
+		kvmppc_mmu_map_segment(vcpu, vcpu->arch.pc);
+	}
+}
+
+static void kvmppc_mmu_book3s_64_mtsrin(struct kvm_vcpu *vcpu, u32 srnum,
+					ulong value)
+{
+	u64 rb = 0, rs = 0;
+
+	/* ESID = srnum */
+	rb |= (srnum & 0xf) << 28;
+	/* Set the valid bit */
+	rb |= 1 << 27;
+	/* Index = ESID */
+	rb |= srnum;
+
+	/* VSID = VSID */
+	rs |= (value & 0xfffffff) << 12;
+	/* flags = flags */
+	rs |= ((value >> 27) & 0xf) << 9;
+
+	kvmppc_mmu_book3s_64_slbmte(vcpu, rs, rb);
+}
+
+static void kvmppc_mmu_book3s_64_tlbie(struct kvm_vcpu *vcpu, ulong va,
+				       bool large)
+{
+	u64 mask = 0xFFFFFFFFFULL;
+
+	dprintk("KVM MMU: tlbie(0x%lx)\n", va);
+
+	if (large)
+		mask = 0xFFFFFF000ULL;
+	kvmppc_mmu_pte_vflush(vcpu, va >> 12, mask);
+}
+
+static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, u64 esid,
+					     u64 *vsid)
+{
+	switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+	case 0:
+		*vsid = (VSID_REAL >> 16) | esid;
+		break;
+	case MSR_IR:
+		*vsid = (VSID_REAL_IR >> 16) | esid;
+		break;
+	case MSR_DR:
+		*vsid = (VSID_REAL_DR >> 16) | esid;
+		break;
+	case MSR_DR|MSR_IR:
+	{
+		ulong ea;
+		struct kvmppc_slb *slb;
+		ea = esid << SID_SHIFT;
+		slb = kvmppc_mmu_book3s_64_find_slbe(to_book3s(vcpu), ea);
+		if (slb)
+			*vsid = slb->vsid;
+		else
+			return -ENOENT;
+
+		break;
+	}
+	default:
+		BUG();
+		break;
+	}
+
+	return 0;
+}
+
+static bool kvmppc_mmu_book3s_64_is_dcbz32(struct kvm_vcpu *vcpu)
+{
+	return (to_book3s(vcpu)->hid[5] & 0x80);
+}
+
+void kvmppc_mmu_book3s_64_init(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_mmu *mmu = &vcpu->arch.mmu;
+
+	mmu->mfsrin = NULL;
+	mmu->mtsrin = kvmppc_mmu_book3s_64_mtsrin;
+	mmu->slbmte = kvmppc_mmu_book3s_64_slbmte;
+	mmu->slbmfee = kvmppc_mmu_book3s_64_slbmfee;
+	mmu->slbmfev = kvmppc_mmu_book3s_64_slbmfev;
+	mmu->slbie = kvmppc_mmu_book3s_64_slbie;
+	mmu->slbia = kvmppc_mmu_book3s_64_slbia;
+	mmu->xlate = kvmppc_mmu_book3s_64_xlate;
+	mmu->reset_msr = kvmppc_mmu_book3s_64_reset_msr;
+	mmu->tlbie = kvmppc_mmu_book3s_64_tlbie;
+	mmu->esid_to_vsid = kvmppc_mmu_book3s_64_esid_to_vsid;
+	mmu->ea_to_vp = kvmppc_mmu_book3s_64_ea_to_vp;
+	mmu->is_dcbz32 = kvmppc_mmu_book3s_64_is_dcbz32;
+}
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 13/27] Add book3s_32 guest MMU
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47     ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

This patch adds an implementation for a G3/G4 MMU, so we can run G3 and
G4 guests in KVM on Book3s_64.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>

---

v5 -> v6:

  - dprintk instead of scattered #ifdef's
  - // -> /* */
  - 80 characters per line
---
 arch/powerpc/kvm/book3s_32_mmu.c |  372 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 372 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_32_mmu.c

diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
new file mode 100644
index 0000000..faf99f2
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -0,0 +1,372 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
+ */
+
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/highmem.h>
+
+#include <asm/tlbflush.h>
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+
+/* #define DEBUG_MMU */
+/* #define DEBUG_MMU_PTE */
+/* #define DEBUG_MMU_PTE_IP 0xfff14c40 */
+
+#ifdef DEBUG_MMU
+#define dprintk(X...) printk(KERN_INFO X)
+#else
+#define dprintk(X...) do { } while(0)
+#endif
+
+#ifdef DEBUG_PTE
+#define dprintk_pte(X...) printk(KERN_INFO X)
+#else
+#define dprintk_pte(X...) do { } while(0)
+#endif
+
+#define PTEG_FLAG_ACCESSED	0x00000100
+#define PTEG_FLAG_DIRTY		0x00000080
+
+static inline bool check_debug_ip(struct kvm_vcpu *vcpu)
+{
+#ifdef DEBUG_MMU_PTE_IP
+	return vcpu->arch.pc == DEBUG_MMU_PTE_IP;
+#else
+	return true;
+#endif
+}
+
+static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr,
+					  struct kvmppc_pte *pte, bool data);
+
+static struct kvmppc_sr *find_sr(struct kvmppc_vcpu_book3s *vcpu_book3s, gva_t eaddr)
+{
+	return &vcpu_book3s->sr[(eaddr >> 28) & 0xf];
+}
+
+static u64 kvmppc_mmu_book3s_32_ea_to_vp(struct kvm_vcpu *vcpu, gva_t eaddr,
+					 bool data)
+{
+	struct kvmppc_sr *sre = find_sr(to_book3s(vcpu), eaddr);
+	struct kvmppc_pte pte;
+
+	if (!kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, &pte, data))
+		return pte.vpage;
+
+	return (((u64)eaddr >> 12) & 0xffff) | (((u64)sre->vsid) << 16);
+}
+
+static void kvmppc_mmu_book3s_32_reset_msr(struct kvm_vcpu *vcpu)
+{
+	kvmppc_set_msr(vcpu, 0);
+}
+
+static hva_t kvmppc_mmu_book3s_32_get_pteg(struct kvmppc_vcpu_book3s *vcpu_book3s,
+				      struct kvmppc_sr *sre, gva_t eaddr,
+				      bool primary)
+{
+	u32 page, hash, pteg, htabmask;
+	hva_t r;
+
+	page = (eaddr & 0x0FFFFFFF) >> 12;
+	htabmask = ((vcpu_book3s->sdr1 & 0x1FF) << 16) | 0xFFC0;
+
+	hash = ((sre->vsid ^ page) << 6);
+	if (!primary)
+		hash = ~hash;
+	hash &= htabmask;
+
+	pteg = (vcpu_book3s->sdr1 & 0xffff0000) | hash;
+
+	dprintk("MMU: pc=0x%lx eaddr=0x%lx sdr1=0x%llx pteg=0x%x vsid=0x%x\n",
+		vcpu_book3s->vcpu.arch.pc, eaddr, vcpu_book3s->sdr1, pteg,
+		sre->vsid);
+
+	r = gfn_to_hva(vcpu_book3s->vcpu.kvm, pteg >> PAGE_SHIFT);
+	if (kvm_is_error_hva(r))
+		return r;
+	return r | (pteg & ~PAGE_MASK);
+}
+
+static u32 kvmppc_mmu_book3s_32_get_ptem(struct kvmppc_sr *sre, gva_t eaddr,
+				    bool primary)
+{
+	return ((eaddr & 0x0fffffff) >> 22) | (sre->vsid << 7) |
+	       (primary ? 0 : 0x40) | 0x80000000;
+}
+
+static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr,
+					  struct kvmppc_pte *pte, bool data)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_bat *bat;
+	int i;
+
+	for (i = 0; i < 8; i++) {
+		if (data)
+			bat = &vcpu_book3s->dbat[i];
+		else
+			bat = &vcpu_book3s->ibat[i];
+
+		if (vcpu->arch.msr & MSR_PR) {
+			if (!bat->vp)
+				continue;
+		} else {
+			if (!bat->vs)
+				continue;
+		}
+
+		if (check_debug_ip(vcpu))
+		{
+			dprintk_pte("%cBAT %02d: 0x%lx - 0x%x (0x%x)\n",
+				    data ? 'd' : 'i', i, eaddr, bat->bepi,
+				    bat->bepi_mask);
+		}
+		if ((eaddr & bat->bepi_mask) == bat->bepi) {
+			pte->raddr = bat->brpn | (eaddr & ~bat->bepi_mask);
+			pte->vpage = (eaddr >> 12) | VSID_BAT;
+			pte->may_read = bat->pp;
+			pte->may_write = bat->pp > 1;
+			pte->may_execute = true;
+			if (!pte->may_read) {
+				printk(KERN_INFO "BAT is not readable!\n");
+				continue;
+			}
+			if (!pte->may_write) {
+				/* let's treat r/o BATs as not-readable for now */
+				dprintk_pte("BAT is read-only!\n");
+				continue;
+			}
+
+			return 0;
+		}
+	}
+
+	return -ENOENT;
+}
+
+static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr,
+				     struct kvmppc_pte *pte, bool data,
+				     bool primary)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_sr *sre;
+	hva_t ptegp;
+	u32 pteg[16];
+	u64 ptem = 0;
+	int i;
+	int found = 0;
+
+	sre = find_sr(vcpu_book3s, eaddr);
+
+	dprintk_pte("SR 0x%lx: vsid=0x%x, raw=0x%x\n", eaddr >> 28,
+		    sre->vsid, sre->raw);
+
+	pte->vpage = kvmppc_mmu_book3s_32_ea_to_vp(vcpu, eaddr, data);
+
+	ptegp = kvmppc_mmu_book3s_32_get_pteg(vcpu_book3s, sre, eaddr, primary);
+	if (kvm_is_error_hva(ptegp)) {
+		printk(KERN_INFO "KVM: Invalid PTEG!\n");
+		goto no_page_found;
+	}
+
+	ptem = kvmppc_mmu_book3s_32_get_ptem(sre, eaddr, primary);
+
+	if(copy_from_user(pteg, (void __user *)ptegp, sizeof(pteg))) {
+		printk(KERN_ERR "KVM: Can't copy data from 0x%lx!\n", ptegp);
+		goto no_page_found;
+	}
+
+	for (i=0; i<16; i+=2) {
+		if (ptem == pteg[i]) {
+			u8 pp;
+
+			pte->raddr = (pteg[i+1] & ~(0xFFFULL)) | (eaddr & 0xFFF);
+			pp = pteg[i+1] & 3;
+
+			if ((sre->Kp &&  (vcpu->arch.msr & MSR_PR)) ||
+			    (sre->Ks && !(vcpu->arch.msr & MSR_PR)))
+				pp |= 4;
+
+			pte->may_write = false;
+			pte->may_read = false;
+			pte->may_execute = true;
+			switch (pp) {
+				case 0:
+				case 1:
+				case 2:
+				case 6:
+					pte->may_write = true;
+				case 3:
+				case 5:
+				case 7:
+					pte->may_read = true;
+					break;
+			}
+
+			if ( !pte->may_read )
+				continue;
+
+			dprintk_pte("MMU: Found PTE -> %x %x - %x\n",
+				    pteg[i], pteg[i+1], pp);
+			found = 1;
+			break;
+		}
+	}
+
+	/* Update PTE C and A bits, so the guest's swapper knows we used the
+	   page */
+	if (found) {
+		u32 oldpte = pteg[i+1];
+
+		if (pte->may_read)
+			pteg[i+1] |= PTEG_FLAG_ACCESSED;
+		if (pte->may_write)
+			pteg[i+1] |= PTEG_FLAG_DIRTY;
+		else
+			dprintk_pte("KVM: Mapping read-only page!\n");
+
+		/* Write back into the PTEG */
+		if (pteg[i+1] != oldpte)
+			copy_to_user((void __user *)ptegp, pteg, sizeof(pteg));
+
+		return 0;
+	}
+
+no_page_found:
+
+	if (check_debug_ip(vcpu)) {
+		dprintk_pte("KVM MMU: No PTE found (sdr1=0x%llx ptegp=0x%lx)\n",
+			    to_book3s(vcpu)->sdr1, ptegp);
+		for (i=0; i<16; i+=2) {
+			dprintk_pte("   %02d: 0x%x - 0x%x (0x%llx)\n",
+				    i, pteg[i], pteg[i+1], ptem);
+		}
+	}
+
+	return -ENOENT;
+}
+
+static int kvmppc_mmu_book3s_32_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
+				      struct kvmppc_pte *pte, bool data)
+{
+	int r;
+
+	pte->eaddr = eaddr;
+	r = kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, pte, data);
+	if (r < 0)
+	       r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte, data, true);
+	if (r < 0)
+	       r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte, data, false);
+
+	return r;
+}
+
+
+static u32 kvmppc_mmu_book3s_32_mfsrin(struct kvm_vcpu *vcpu, u32 srnum)
+{
+	return to_book3s(vcpu)->sr[srnum].raw;
+}
+
+static void kvmppc_mmu_book3s_32_mtsrin(struct kvm_vcpu *vcpu, u32 srnum,
+					ulong value)
+{
+	struct kvmppc_sr *sre;
+
+	sre = &to_book3s(vcpu)->sr[srnum];
+
+	/* Flush any left-over shadows from the previous SR */
+
+	/* XXX Not necessary? */
+	/* kvmppc_mmu_pte_flush(vcpu, ((u64)sre->vsid) << 28, 0xf0000000ULL); */
+
+	/* And then put in the new SR */
+	sre->raw = value;
+	sre->vsid = (value & 0x0fffffff);
+	sre->Ks = (value & 0x40000000) ? true : false;
+	sre->Kp = (value & 0x20000000) ? true : false;
+	sre->nx = (value & 0x10000000) ? true : false;
+
+	/* Map the new segment */
+	kvmppc_mmu_map_segment(vcpu, srnum << SID_SHIFT);
+}
+
+static void kvmppc_mmu_book3s_32_tlbie(struct kvm_vcpu *vcpu, ulong ea, bool large)
+{
+	kvmppc_mmu_pte_flush(vcpu, ea, ~0xFFFULL);
+}
+
+static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, u64 esid,
+					     u64 *vsid)
+{
+	/* In case we only have one of MSR_IR or MSR_DR set, let's put
+	   that in the real-mode context (and hope RM doesn't access
+	   high memory) */
+	switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+	case 0:
+		*vsid = (VSID_REAL >> 16) | esid;
+		break;
+	case MSR_IR:
+		*vsid = (VSID_REAL_IR >> 16) | esid;
+		break;
+	case MSR_DR:
+		*vsid = (VSID_REAL_DR >> 16) | esid;
+		break;
+	case MSR_DR|MSR_IR:
+	{
+		ulong ea;
+		ea = esid << SID_SHIFT;
+		*vsid = find_sr(to_book3s(vcpu), ea)->vsid;
+		break;
+	}
+	default:
+		BUG();
+	}
+
+	return 0;
+}
+
+static bool kvmppc_mmu_book3s_32_is_dcbz32(struct kvm_vcpu *vcpu)
+{
+	return true;
+}
+
+
+void kvmppc_mmu_book3s_32_init(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_mmu *mmu = &vcpu->arch.mmu;
+
+	mmu->mtsrin = kvmppc_mmu_book3s_32_mtsrin;
+	mmu->mfsrin = kvmppc_mmu_book3s_32_mfsrin;
+	mmu->xlate = kvmppc_mmu_book3s_32_xlate;
+	mmu->reset_msr = kvmppc_mmu_book3s_32_reset_msr;
+	mmu->tlbie = kvmppc_mmu_book3s_32_tlbie;
+	mmu->esid_to_vsid = kvmppc_mmu_book3s_32_esid_to_vsid;
+	mmu->ea_to_vp = kvmppc_mmu_book3s_32_ea_to_vp;
+	mmu->is_dcbz32 = kvmppc_mmu_book3s_32_is_dcbz32;
+
+	mmu->slbmte = NULL;
+	mmu->slbmfee = NULL;
+	mmu->slbmfev = NULL;
+	mmu->slbie = NULL;
+	mmu->slbia = NULL;
+}
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 13/27] Add book3s_32 guest MMU
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

This patch adds an implementation for a G3/G4 MMU, so we can run G3 and
G4 guests in KVM on Book3s_64.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v5 -> v6:

  - dprintk instead of scattered #ifdef's
  - // -> /* */
  - 80 characters per line
---
 arch/powerpc/kvm/book3s_32_mmu.c |  372 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 372 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_32_mmu.c

diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
new file mode 100644
index 0000000..faf99f2
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -0,0 +1,372 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/highmem.h>
+
+#include <asm/tlbflush.h>
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+
+/* #define DEBUG_MMU */
+/* #define DEBUG_MMU_PTE */
+/* #define DEBUG_MMU_PTE_IP 0xfff14c40 */
+
+#ifdef DEBUG_MMU
+#define dprintk(X...) printk(KERN_INFO X)
+#else
+#define dprintk(X...) do { } while(0)
+#endif
+
+#ifdef DEBUG_PTE
+#define dprintk_pte(X...) printk(KERN_INFO X)
+#else
+#define dprintk_pte(X...) do { } while(0)
+#endif
+
+#define PTEG_FLAG_ACCESSED	0x00000100
+#define PTEG_FLAG_DIRTY		0x00000080
+
+static inline bool check_debug_ip(struct kvm_vcpu *vcpu)
+{
+#ifdef DEBUG_MMU_PTE_IP
+	return vcpu->arch.pc == DEBUG_MMU_PTE_IP;
+#else
+	return true;
+#endif
+}
+
+static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr,
+					  struct kvmppc_pte *pte, bool data);
+
+static struct kvmppc_sr *find_sr(struct kvmppc_vcpu_book3s *vcpu_book3s, gva_t eaddr)
+{
+	return &vcpu_book3s->sr[(eaddr >> 28) & 0xf];
+}
+
+static u64 kvmppc_mmu_book3s_32_ea_to_vp(struct kvm_vcpu *vcpu, gva_t eaddr,
+					 bool data)
+{
+	struct kvmppc_sr *sre = find_sr(to_book3s(vcpu), eaddr);
+	struct kvmppc_pte pte;
+
+	if (!kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, &pte, data))
+		return pte.vpage;
+
+	return (((u64)eaddr >> 12) & 0xffff) | (((u64)sre->vsid) << 16);
+}
+
+static void kvmppc_mmu_book3s_32_reset_msr(struct kvm_vcpu *vcpu)
+{
+	kvmppc_set_msr(vcpu, 0);
+}
+
+static hva_t kvmppc_mmu_book3s_32_get_pteg(struct kvmppc_vcpu_book3s *vcpu_book3s,
+				      struct kvmppc_sr *sre, gva_t eaddr,
+				      bool primary)
+{
+	u32 page, hash, pteg, htabmask;
+	hva_t r;
+
+	page = (eaddr & 0x0FFFFFFF) >> 12;
+	htabmask = ((vcpu_book3s->sdr1 & 0x1FF) << 16) | 0xFFC0;
+
+	hash = ((sre->vsid ^ page) << 6);
+	if (!primary)
+		hash = ~hash;
+	hash &= htabmask;
+
+	pteg = (vcpu_book3s->sdr1 & 0xffff0000) | hash;
+
+	dprintk("MMU: pc=0x%lx eaddr=0x%lx sdr1=0x%llx pteg=0x%x vsid=0x%x\n",
+		vcpu_book3s->vcpu.arch.pc, eaddr, vcpu_book3s->sdr1, pteg,
+		sre->vsid);
+
+	r = gfn_to_hva(vcpu_book3s->vcpu.kvm, pteg >> PAGE_SHIFT);
+	if (kvm_is_error_hva(r))
+		return r;
+	return r | (pteg & ~PAGE_MASK);
+}
+
+static u32 kvmppc_mmu_book3s_32_get_ptem(struct kvmppc_sr *sre, gva_t eaddr,
+				    bool primary)
+{
+	return ((eaddr & 0x0fffffff) >> 22) | (sre->vsid << 7) |
+	       (primary ? 0 : 0x40) | 0x80000000;
+}
+
+static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr,
+					  struct kvmppc_pte *pte, bool data)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_bat *bat;
+	int i;
+
+	for (i = 0; i < 8; i++) {
+		if (data)
+			bat = &vcpu_book3s->dbat[i];
+		else
+			bat = &vcpu_book3s->ibat[i];
+
+		if (vcpu->arch.msr & MSR_PR) {
+			if (!bat->vp)
+				continue;
+		} else {
+			if (!bat->vs)
+				continue;
+		}
+
+		if (check_debug_ip(vcpu))
+		{
+			dprintk_pte("%cBAT %02d: 0x%lx - 0x%x (0x%x)\n",
+				    data ? 'd' : 'i', i, eaddr, bat->bepi,
+				    bat->bepi_mask);
+		}
+		if ((eaddr & bat->bepi_mask) == bat->bepi) {
+			pte->raddr = bat->brpn | (eaddr & ~bat->bepi_mask);
+			pte->vpage = (eaddr >> 12) | VSID_BAT;
+			pte->may_read = bat->pp;
+			pte->may_write = bat->pp > 1;
+			pte->may_execute = true;
+			if (!pte->may_read) {
+				printk(KERN_INFO "BAT is not readable!\n");
+				continue;
+			}
+			if (!pte->may_write) {
+				/* let's treat r/o BATs as not-readable for now */
+				dprintk_pte("BAT is read-only!\n");
+				continue;
+			}
+
+			return 0;
+		}
+	}
+
+	return -ENOENT;
+}
+
+static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr,
+				     struct kvmppc_pte *pte, bool data,
+				     bool primary)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_sr *sre;
+	hva_t ptegp;
+	u32 pteg[16];
+	u64 ptem = 0;
+	int i;
+	int found = 0;
+
+	sre = find_sr(vcpu_book3s, eaddr);
+
+	dprintk_pte("SR 0x%lx: vsid=0x%x, raw=0x%x\n", eaddr >> 28,
+		    sre->vsid, sre->raw);
+
+	pte->vpage = kvmppc_mmu_book3s_32_ea_to_vp(vcpu, eaddr, data);
+
+	ptegp = kvmppc_mmu_book3s_32_get_pteg(vcpu_book3s, sre, eaddr, primary);
+	if (kvm_is_error_hva(ptegp)) {
+		printk(KERN_INFO "KVM: Invalid PTEG!\n");
+		goto no_page_found;
+	}
+
+	ptem = kvmppc_mmu_book3s_32_get_ptem(sre, eaddr, primary);
+
+	if(copy_from_user(pteg, (void __user *)ptegp, sizeof(pteg))) {
+		printk(KERN_ERR "KVM: Can't copy data from 0x%lx!\n", ptegp);
+		goto no_page_found;
+	}
+
+	for (i=0; i<16; i+=2) {
+		if (ptem == pteg[i]) {
+			u8 pp;
+
+			pte->raddr = (pteg[i+1] & ~(0xFFFULL)) | (eaddr & 0xFFF);
+			pp = pteg[i+1] & 3;
+
+			if ((sre->Kp &&  (vcpu->arch.msr & MSR_PR)) ||
+			    (sre->Ks && !(vcpu->arch.msr & MSR_PR)))
+				pp |= 4;
+
+			pte->may_write = false;
+			pte->may_read = false;
+			pte->may_execute = true;
+			switch (pp) {
+				case 0:
+				case 1:
+				case 2:
+				case 6:
+					pte->may_write = true;
+				case 3:
+				case 5:
+				case 7:
+					pte->may_read = true;
+					break;
+			}
+
+			if ( !pte->may_read )
+				continue;
+
+			dprintk_pte("MMU: Found PTE -> %x %x - %x\n",
+				    pteg[i], pteg[i+1], pp);
+			found = 1;
+			break;
+		}
+	}
+
+	/* Update PTE C and A bits, so the guest's swapper knows we used the
+	   page */
+	if (found) {
+		u32 oldpte = pteg[i+1];
+
+		if (pte->may_read)
+			pteg[i+1] |= PTEG_FLAG_ACCESSED;
+		if (pte->may_write)
+			pteg[i+1] |= PTEG_FLAG_DIRTY;
+		else
+			dprintk_pte("KVM: Mapping read-only page!\n");
+
+		/* Write back into the PTEG */
+		if (pteg[i+1] != oldpte)
+			copy_to_user((void __user *)ptegp, pteg, sizeof(pteg));
+
+		return 0;
+	}
+
+no_page_found:
+
+	if (check_debug_ip(vcpu)) {
+		dprintk_pte("KVM MMU: No PTE found (sdr1=0x%llx ptegp=0x%lx)\n",
+			    to_book3s(vcpu)->sdr1, ptegp);
+		for (i=0; i<16; i+=2) {
+			dprintk_pte("   %02d: 0x%x - 0x%x (0x%llx)\n",
+				    i, pteg[i], pteg[i+1], ptem);
+		}
+	}
+
+	return -ENOENT;
+}
+
+static int kvmppc_mmu_book3s_32_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
+				      struct kvmppc_pte *pte, bool data)
+{
+	int r;
+
+	pte->eaddr = eaddr;
+	r = kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, pte, data);
+	if (r < 0)
+	       r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte, data, true);
+	if (r < 0)
+	       r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte, data, false);
+
+	return r;
+}
+
+
+static u32 kvmppc_mmu_book3s_32_mfsrin(struct kvm_vcpu *vcpu, u32 srnum)
+{
+	return to_book3s(vcpu)->sr[srnum].raw;
+}
+
+static void kvmppc_mmu_book3s_32_mtsrin(struct kvm_vcpu *vcpu, u32 srnum,
+					ulong value)
+{
+	struct kvmppc_sr *sre;
+
+	sre = &to_book3s(vcpu)->sr[srnum];
+
+	/* Flush any left-over shadows from the previous SR */
+
+	/* XXX Not necessary? */
+	/* kvmppc_mmu_pte_flush(vcpu, ((u64)sre->vsid) << 28, 0xf0000000ULL); */
+
+	/* And then put in the new SR */
+	sre->raw = value;
+	sre->vsid = (value & 0x0fffffff);
+	sre->Ks = (value & 0x40000000) ? true : false;
+	sre->Kp = (value & 0x20000000) ? true : false;
+	sre->nx = (value & 0x10000000) ? true : false;
+
+	/* Map the new segment */
+	kvmppc_mmu_map_segment(vcpu, srnum << SID_SHIFT);
+}
+
+static void kvmppc_mmu_book3s_32_tlbie(struct kvm_vcpu *vcpu, ulong ea, bool large)
+{
+	kvmppc_mmu_pte_flush(vcpu, ea, ~0xFFFULL);
+}
+
+static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, u64 esid,
+					     u64 *vsid)
+{
+	/* In case we only have one of MSR_IR or MSR_DR set, let's put
+	   that in the real-mode context (and hope RM doesn't access
+	   high memory) */
+	switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+	case 0:
+		*vsid = (VSID_REAL >> 16) | esid;
+		break;
+	case MSR_IR:
+		*vsid = (VSID_REAL_IR >> 16) | esid;
+		break;
+	case MSR_DR:
+		*vsid = (VSID_REAL_DR >> 16) | esid;
+		break;
+	case MSR_DR|MSR_IR:
+	{
+		ulong ea;
+		ea = esid << SID_SHIFT;
+		*vsid = find_sr(to_book3s(vcpu), ea)->vsid;
+		break;
+	}
+	default:
+		BUG();
+	}
+
+	return 0;
+}
+
+static bool kvmppc_mmu_book3s_32_is_dcbz32(struct kvm_vcpu *vcpu)
+{
+	return true;
+}
+
+
+void kvmppc_mmu_book3s_32_init(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_mmu *mmu = &vcpu->arch.mmu;
+
+	mmu->mtsrin = kvmppc_mmu_book3s_32_mtsrin;
+	mmu->mfsrin = kvmppc_mmu_book3s_32_mfsrin;
+	mmu->xlate = kvmppc_mmu_book3s_32_xlate;
+	mmu->reset_msr = kvmppc_mmu_book3s_32_reset_msr;
+	mmu->tlbie = kvmppc_mmu_book3s_32_tlbie;
+	mmu->esid_to_vsid = kvmppc_mmu_book3s_32_esid_to_vsid;
+	mmu->ea_to_vp = kvmppc_mmu_book3s_32_ea_to_vp;
+	mmu->is_dcbz32 = kvmppc_mmu_book3s_32_is_dcbz32;
+
+	mmu->slbmte = NULL;
+	mmu->slbmfee = NULL;
+	mmu->slbmfev = NULL;
+	mmu->slbie = NULL;
+	mmu->slbia = NULL;
+}
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 13/27] Add book3s_32 guest MMU
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

This patch adds an implementation for a G3/G4 MMU, so we can run G3 and
G4 guests in KVM on Book3s_64.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v5 -> v6:

  - dprintk instead of scattered #ifdef's
  - // -> /* */
  - 80 characters per line
---
 arch/powerpc/kvm/book3s_32_mmu.c |  372 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 372 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_32_mmu.c

diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
new file mode 100644
index 0000000..faf99f2
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -0,0 +1,372 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/highmem.h>
+
+#include <asm/tlbflush.h>
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s.h>
+
+/* #define DEBUG_MMU */
+/* #define DEBUG_MMU_PTE */
+/* #define DEBUG_MMU_PTE_IP 0xfff14c40 */
+
+#ifdef DEBUG_MMU
+#define dprintk(X...) printk(KERN_INFO X)
+#else
+#define dprintk(X...) do { } while(0)
+#endif
+
+#ifdef DEBUG_PTE
+#define dprintk_pte(X...) printk(KERN_INFO X)
+#else
+#define dprintk_pte(X...) do { } while(0)
+#endif
+
+#define PTEG_FLAG_ACCESSED	0x00000100
+#define PTEG_FLAG_DIRTY		0x00000080
+
+static inline bool check_debug_ip(struct kvm_vcpu *vcpu)
+{
+#ifdef DEBUG_MMU_PTE_IP
+	return vcpu->arch.pc = DEBUG_MMU_PTE_IP;
+#else
+	return true;
+#endif
+}
+
+static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr,
+					  struct kvmppc_pte *pte, bool data);
+
+static struct kvmppc_sr *find_sr(struct kvmppc_vcpu_book3s *vcpu_book3s, gva_t eaddr)
+{
+	return &vcpu_book3s->sr[(eaddr >> 28) & 0xf];
+}
+
+static u64 kvmppc_mmu_book3s_32_ea_to_vp(struct kvm_vcpu *vcpu, gva_t eaddr,
+					 bool data)
+{
+	struct kvmppc_sr *sre = find_sr(to_book3s(vcpu), eaddr);
+	struct kvmppc_pte pte;
+
+	if (!kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, &pte, data))
+		return pte.vpage;
+
+	return (((u64)eaddr >> 12) & 0xffff) | (((u64)sre->vsid) << 16);
+}
+
+static void kvmppc_mmu_book3s_32_reset_msr(struct kvm_vcpu *vcpu)
+{
+	kvmppc_set_msr(vcpu, 0);
+}
+
+static hva_t kvmppc_mmu_book3s_32_get_pteg(struct kvmppc_vcpu_book3s *vcpu_book3s,
+				      struct kvmppc_sr *sre, gva_t eaddr,
+				      bool primary)
+{
+	u32 page, hash, pteg, htabmask;
+	hva_t r;
+
+	page = (eaddr & 0x0FFFFFFF) >> 12;
+	htabmask = ((vcpu_book3s->sdr1 & 0x1FF) << 16) | 0xFFC0;
+
+	hash = ((sre->vsid ^ page) << 6);
+	if (!primary)
+		hash = ~hash;
+	hash &= htabmask;
+
+	pteg = (vcpu_book3s->sdr1 & 0xffff0000) | hash;
+
+	dprintk("MMU: pc=0x%lx eaddr=0x%lx sdr1=0x%llx pteg=0x%x vsid=0x%x\n",
+		vcpu_book3s->vcpu.arch.pc, eaddr, vcpu_book3s->sdr1, pteg,
+		sre->vsid);
+
+	r = gfn_to_hva(vcpu_book3s->vcpu.kvm, pteg >> PAGE_SHIFT);
+	if (kvm_is_error_hva(r))
+		return r;
+	return r | (pteg & ~PAGE_MASK);
+}
+
+static u32 kvmppc_mmu_book3s_32_get_ptem(struct kvmppc_sr *sre, gva_t eaddr,
+				    bool primary)
+{
+	return ((eaddr & 0x0fffffff) >> 22) | (sre->vsid << 7) |
+	       (primary ? 0 : 0x40) | 0x80000000;
+}
+
+static int kvmppc_mmu_book3s_32_xlate_bat(struct kvm_vcpu *vcpu, gva_t eaddr,
+					  struct kvmppc_pte *pte, bool data)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_bat *bat;
+	int i;
+
+	for (i = 0; i < 8; i++) {
+		if (data)
+			bat = &vcpu_book3s->dbat[i];
+		else
+			bat = &vcpu_book3s->ibat[i];
+
+		if (vcpu->arch.msr & MSR_PR) {
+			if (!bat->vp)
+				continue;
+		} else {
+			if (!bat->vs)
+				continue;
+		}
+
+		if (check_debug_ip(vcpu))
+		{
+			dprintk_pte("%cBAT %02d: 0x%lx - 0x%x (0x%x)\n",
+				    data ? 'd' : 'i', i, eaddr, bat->bepi,
+				    bat->bepi_mask);
+		}
+		if ((eaddr & bat->bepi_mask) = bat->bepi) {
+			pte->raddr = bat->brpn | (eaddr & ~bat->bepi_mask);
+			pte->vpage = (eaddr >> 12) | VSID_BAT;
+			pte->may_read = bat->pp;
+			pte->may_write = bat->pp > 1;
+			pte->may_execute = true;
+			if (!pte->may_read) {
+				printk(KERN_INFO "BAT is not readable!\n");
+				continue;
+			}
+			if (!pte->may_write) {
+				/* let's treat r/o BATs as not-readable for now */
+				dprintk_pte("BAT is read-only!\n");
+				continue;
+			}
+
+			return 0;
+		}
+	}
+
+	return -ENOENT;
+}
+
+static int kvmppc_mmu_book3s_32_xlate_pte(struct kvm_vcpu *vcpu, gva_t eaddr,
+				     struct kvmppc_pte *pte, bool data,
+				     bool primary)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_sr *sre;
+	hva_t ptegp;
+	u32 pteg[16];
+	u64 ptem = 0;
+	int i;
+	int found = 0;
+
+	sre = find_sr(vcpu_book3s, eaddr);
+
+	dprintk_pte("SR 0x%lx: vsid=0x%x, raw=0x%x\n", eaddr >> 28,
+		    sre->vsid, sre->raw);
+
+	pte->vpage = kvmppc_mmu_book3s_32_ea_to_vp(vcpu, eaddr, data);
+
+	ptegp = kvmppc_mmu_book3s_32_get_pteg(vcpu_book3s, sre, eaddr, primary);
+	if (kvm_is_error_hva(ptegp)) {
+		printk(KERN_INFO "KVM: Invalid PTEG!\n");
+		goto no_page_found;
+	}
+
+	ptem = kvmppc_mmu_book3s_32_get_ptem(sre, eaddr, primary);
+
+	if(copy_from_user(pteg, (void __user *)ptegp, sizeof(pteg))) {
+		printk(KERN_ERR "KVM: Can't copy data from 0x%lx!\n", ptegp);
+		goto no_page_found;
+	}
+
+	for (i=0; i<16; i+=2) {
+		if (ptem = pteg[i]) {
+			u8 pp;
+
+			pte->raddr = (pteg[i+1] & ~(0xFFFULL)) | (eaddr & 0xFFF);
+			pp = pteg[i+1] & 3;
+
+			if ((sre->Kp &&  (vcpu->arch.msr & MSR_PR)) ||
+			    (sre->Ks && !(vcpu->arch.msr & MSR_PR)))
+				pp |= 4;
+
+			pte->may_write = false;
+			pte->may_read = false;
+			pte->may_execute = true;
+			switch (pp) {
+				case 0:
+				case 1:
+				case 2:
+				case 6:
+					pte->may_write = true;
+				case 3:
+				case 5:
+				case 7:
+					pte->may_read = true;
+					break;
+			}
+
+			if ( !pte->may_read )
+				continue;
+
+			dprintk_pte("MMU: Found PTE -> %x %x - %x\n",
+				    pteg[i], pteg[i+1], pp);
+			found = 1;
+			break;
+		}
+	}
+
+	/* Update PTE C and A bits, so the guest's swapper knows we used the
+	   page */
+	if (found) {
+		u32 oldpte = pteg[i+1];
+
+		if (pte->may_read)
+			pteg[i+1] |= PTEG_FLAG_ACCESSED;
+		if (pte->may_write)
+			pteg[i+1] |= PTEG_FLAG_DIRTY;
+		else
+			dprintk_pte("KVM: Mapping read-only page!\n");
+
+		/* Write back into the PTEG */
+		if (pteg[i+1] != oldpte)
+			copy_to_user((void __user *)ptegp, pteg, sizeof(pteg));
+
+		return 0;
+	}
+
+no_page_found:
+
+	if (check_debug_ip(vcpu)) {
+		dprintk_pte("KVM MMU: No PTE found (sdr1=0x%llx ptegp=0x%lx)\n",
+			    to_book3s(vcpu)->sdr1, ptegp);
+		for (i=0; i<16; i+=2) {
+			dprintk_pte("   %02d: 0x%x - 0x%x (0x%llx)\n",
+				    i, pteg[i], pteg[i+1], ptem);
+		}
+	}
+
+	return -ENOENT;
+}
+
+static int kvmppc_mmu_book3s_32_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
+				      struct kvmppc_pte *pte, bool data)
+{
+	int r;
+
+	pte->eaddr = eaddr;
+	r = kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, pte, data);
+	if (r < 0)
+	       r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte, data, true);
+	if (r < 0)
+	       r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte, data, false);
+
+	return r;
+}
+
+
+static u32 kvmppc_mmu_book3s_32_mfsrin(struct kvm_vcpu *vcpu, u32 srnum)
+{
+	return to_book3s(vcpu)->sr[srnum].raw;
+}
+
+static void kvmppc_mmu_book3s_32_mtsrin(struct kvm_vcpu *vcpu, u32 srnum,
+					ulong value)
+{
+	struct kvmppc_sr *sre;
+
+	sre = &to_book3s(vcpu)->sr[srnum];
+
+	/* Flush any left-over shadows from the previous SR */
+
+	/* XXX Not necessary? */
+	/* kvmppc_mmu_pte_flush(vcpu, ((u64)sre->vsid) << 28, 0xf0000000ULL); */
+
+	/* And then put in the new SR */
+	sre->raw = value;
+	sre->vsid = (value & 0x0fffffff);
+	sre->Ks = (value & 0x40000000) ? true : false;
+	sre->Kp = (value & 0x20000000) ? true : false;
+	sre->nx = (value & 0x10000000) ? true : false;
+
+	/* Map the new segment */
+	kvmppc_mmu_map_segment(vcpu, srnum << SID_SHIFT);
+}
+
+static void kvmppc_mmu_book3s_32_tlbie(struct kvm_vcpu *vcpu, ulong ea, bool large)
+{
+	kvmppc_mmu_pte_flush(vcpu, ea, ~0xFFFULL);
+}
+
+static int kvmppc_mmu_book3s_32_esid_to_vsid(struct kvm_vcpu *vcpu, u64 esid,
+					     u64 *vsid)
+{
+	/* In case we only have one of MSR_IR or MSR_DR set, let's put
+	   that in the real-mode context (and hope RM doesn't access
+	   high memory) */
+	switch (vcpu->arch.msr & (MSR_DR|MSR_IR)) {
+	case 0:
+		*vsid = (VSID_REAL >> 16) | esid;
+		break;
+	case MSR_IR:
+		*vsid = (VSID_REAL_IR >> 16) | esid;
+		break;
+	case MSR_DR:
+		*vsid = (VSID_REAL_DR >> 16) | esid;
+		break;
+	case MSR_DR|MSR_IR:
+	{
+		ulong ea;
+		ea = esid << SID_SHIFT;
+		*vsid = find_sr(to_book3s(vcpu), ea)->vsid;
+		break;
+	}
+	default:
+		BUG();
+	}
+
+	return 0;
+}
+
+static bool kvmppc_mmu_book3s_32_is_dcbz32(struct kvm_vcpu *vcpu)
+{
+	return true;
+}
+
+
+void kvmppc_mmu_book3s_32_init(struct kvm_vcpu *vcpu)
+{
+	struct kvmppc_mmu *mmu = &vcpu->arch.mmu;
+
+	mmu->mtsrin = kvmppc_mmu_book3s_32_mtsrin;
+	mmu->mfsrin = kvmppc_mmu_book3s_32_mfsrin;
+	mmu->xlate = kvmppc_mmu_book3s_32_xlate;
+	mmu->reset_msr = kvmppc_mmu_book3s_32_reset_msr;
+	mmu->tlbie = kvmppc_mmu_book3s_32_tlbie;
+	mmu->esid_to_vsid = kvmppc_mmu_book3s_32_esid_to_vsid;
+	mmu->ea_to_vp = kvmppc_mmu_book3s_32_ea_to_vp;
+	mmu->is_dcbz32 = kvmppc_mmu_book3s_32_is_dcbz32;
+
+	mmu->slbmte = NULL;
+	mmu->slbmfee = NULL;
+	mmu->slbmfev = NULL;
+	mmu->slbie = NULL;
+	mmu->slbia = NULL;
+}
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 14/27] Add book3s_64 specific opcode emulation
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47     ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

There are generic parts of PowerPC that can be shared across all
implementations and specific parts that only apply to BookE or desktop PPCs.

This patch adds emulation for desktop specific opcodes that don't apply
to BookE CPUs.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>

---

v5 -> v6:

  - // -> /* */
---
 arch/powerpc/kvm/book3s_64_emulate.c |  337 ++++++++++++++++++++++++++++++++++
 1 files changed, 337 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_emulate.c

diff --git a/arch/powerpc/kvm/book3s_64_emulate.c b/arch/powerpc/kvm/book3s_64_emulate.c
new file mode 100644
index 0000000..c343e67
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_emulate.c
@@ -0,0 +1,337 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
+ */
+
+#include <asm/kvm_ppc.h>
+#include <asm/disassemble.h>
+#include <asm/kvm_book3s.h>
+#include <asm/reg.h>
+
+#define OP_19_XOP_RFID		18
+#define OP_19_XOP_RFI		50
+
+#define OP_31_XOP_MFMSR		83
+#define OP_31_XOP_MTMSR		146
+#define OP_31_XOP_MTMSRD	178
+#define OP_31_XOP_MTSRIN	242
+#define OP_31_XOP_TLBIEL	274
+#define OP_31_XOP_TLBIE		306
+#define OP_31_XOP_SLBMTE	402
+#define OP_31_XOP_SLBIE		434
+#define OP_31_XOP_SLBIA		498
+#define OP_31_XOP_MFSRIN	659
+#define OP_31_XOP_SLBMFEV	851
+#define OP_31_XOP_EIOIO		854
+#define OP_31_XOP_SLBMFEE	915
+
+/* DCBZ is actually 1014, but we patch it to 1010 so we get a trap */
+#define OP_31_XOP_DCBZ		1010
+
+int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
+                           unsigned int inst, int *advance)
+{
+	int emulated = EMULATE_DONE;
+
+	switch (get_op(inst)) {
+	case 19:
+		switch (get_xop(inst)) {
+		case OP_19_XOP_RFID:
+		case OP_19_XOP_RFI:
+			vcpu->arch.pc = vcpu->arch.srr0;
+			kvmppc_set_msr(vcpu, vcpu->arch.srr1);
+			*advance = 0;
+			break;
+
+		default:
+			emulated = EMULATE_FAIL;
+			break;
+		}
+		break;
+	case 31:
+		switch (get_xop(inst)) {
+		case OP_31_XOP_MFMSR:
+			vcpu->arch.gpr[get_rt(inst)] = vcpu->arch.msr;
+			break;
+		case OP_31_XOP_MTMSRD:
+		{
+			ulong rs = vcpu->arch.gpr[get_rs(inst)];
+			if (inst & 0x10000) {
+				vcpu->arch.msr &= ~(MSR_RI | MSR_EE);
+				vcpu->arch.msr |= rs & (MSR_RI | MSR_EE);
+			} else
+				kvmppc_set_msr(vcpu, rs);
+			break;
+		}
+		case OP_31_XOP_MTMSR:
+			kvmppc_set_msr(vcpu, vcpu->arch.gpr[get_rs(inst)]);
+			break;
+		case OP_31_XOP_MFSRIN:
+		{
+			int srnum;
+
+			srnum = (vcpu->arch.gpr[get_rb(inst)] >> 28) & 0xf;
+			if (vcpu->arch.mmu.mfsrin) {
+				u32 sr;
+				sr = vcpu->arch.mmu.mfsrin(vcpu, srnum);
+				vcpu->arch.gpr[get_rt(inst)] = sr;
+			}
+			break;
+		}
+		case OP_31_XOP_MTSRIN:
+			vcpu->arch.mmu.mtsrin(vcpu,
+				(vcpu->arch.gpr[get_rb(inst)] >> 28) & 0xf,
+				vcpu->arch.gpr[get_rs(inst)]);
+			break;
+		case OP_31_XOP_TLBIE:
+		case OP_31_XOP_TLBIEL:
+		{
+			bool large = (inst & 0x00200000) ? true : false;
+			ulong addr = vcpu->arch.gpr[get_rb(inst)];
+			vcpu->arch.mmu.tlbie(vcpu, addr, large);
+			break;
+		}
+		case OP_31_XOP_EIOIO:
+			break;
+		case OP_31_XOP_SLBMTE:
+			if (!vcpu->arch.mmu.slbmte)
+				return EMULATE_FAIL;
+
+			vcpu->arch.mmu.slbmte(vcpu, vcpu->arch.gpr[get_rs(inst)],
+						vcpu->arch.gpr[get_rb(inst)]);
+			break;
+		case OP_31_XOP_SLBIE:
+			if (!vcpu->arch.mmu.slbie)
+				return EMULATE_FAIL;
+
+			vcpu->arch.mmu.slbie(vcpu, vcpu->arch.gpr[get_rb(inst)]);
+			break;
+		case OP_31_XOP_SLBIA:
+			if (!vcpu->arch.mmu.slbia)
+				return EMULATE_FAIL;
+
+			vcpu->arch.mmu.slbia(vcpu);
+			break;
+		case OP_31_XOP_SLBMFEE:
+			if (!vcpu->arch.mmu.slbmfee) {
+				emulated = EMULATE_FAIL;
+			} else {
+				ulong t, rb;
+
+				rb = vcpu->arch.gpr[get_rb(inst)];
+				t = vcpu->arch.mmu.slbmfee(vcpu, rb);
+				vcpu->arch.gpr[get_rt(inst)] = t;
+			}
+			break;
+		case OP_31_XOP_SLBMFEV:
+			if (!vcpu->arch.mmu.slbmfev) {
+				emulated = EMULATE_FAIL;
+			} else {
+				ulong t, rb;
+
+				rb = vcpu->arch.gpr[get_rb(inst)];
+				t = vcpu->arch.mmu.slbmfev(vcpu, rb);
+				vcpu->arch.gpr[get_rt(inst)] = t;
+			}
+			break;
+		case OP_31_XOP_DCBZ:
+		{
+			ulong rb =  vcpu->arch.gpr[get_rb(inst)];
+			ulong ra = 0;
+			ulong addr;
+			u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
+
+			if (get_ra(inst))
+				ra = vcpu->arch.gpr[get_ra(inst)];
+
+			addr = (ra + rb) & ~31ULL;
+			if (!(vcpu->arch.msr & MSR_SF))
+				addr &= 0xffffffff;
+
+			if (kvmppc_st(vcpu, addr, 32, zeros)) {
+				vcpu->arch.dear = addr;
+				vcpu->arch.fault_dear = addr;
+				to_book3s(vcpu)->dsisr = DSISR_PROTFAULT |
+						      DSISR_ISSTORE;
+				kvmppc_book3s_queue_irqprio(vcpu,
+					BOOK3S_INTERRUPT_DATA_STORAGE);
+				kvmppc_mmu_pte_flush(vcpu, addr, ~0xFFFULL);
+			}
+
+			break;
+		}
+		default:
+			emulated = EMULATE_FAIL;
+		}
+		break;
+	default:
+		emulated = EMULATE_FAIL;
+	}
+
+	return emulated;
+}
+
+static void kvmppc_write_bat(struct kvm_vcpu *vcpu, int sprn, u64 val)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_bat *bat;
+
+	switch (sprn) {
+	case SPRN_IBAT0U ... SPRN_IBAT3L:
+		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT0U) / 2];
+		break;
+	case SPRN_IBAT4U ... SPRN_IBAT7L:
+		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT4U) / 2];
+		break;
+	case SPRN_DBAT0U ... SPRN_DBAT3L:
+		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT0U) / 2];
+		break;
+	case SPRN_DBAT4U ... SPRN_DBAT7L:
+		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT4U) / 2];
+		break;
+	default:
+		BUG();
+	}
+
+	if (!(sprn % 2)) {
+		/* Upper BAT */
+		u32 bl = (val >> 2) & 0x7ff;
+		bat->bepi_mask = (~bl << 17);
+		bat->bepi = val & 0xfffe0000;
+		bat->vs = (val & 2) ? 1 : 0;
+		bat->vp = (val & 1) ? 1 : 0;
+	} else {
+		/* Lower BAT */
+		bat->brpn = val & 0xfffe0000;
+		bat->wimg = (val >> 3) & 0xf;
+		bat->pp = val & 3;
+	}
+}
+
+int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs)
+{
+	int emulated = EMULATE_DONE;
+
+	switch (sprn) {
+	case SPRN_SDR1:
+		to_book3s(vcpu)->sdr1 = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_DSISR:
+		to_book3s(vcpu)->dsisr = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_DAR:
+		vcpu->arch.dear = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HIOR:
+		to_book3s(vcpu)->hior = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_IBAT0U ... SPRN_IBAT3L:
+	case SPRN_IBAT4U ... SPRN_IBAT7L:
+	case SPRN_DBAT0U ... SPRN_DBAT3L:
+	case SPRN_DBAT4U ... SPRN_DBAT7L:
+		kvmppc_write_bat(vcpu, sprn, vcpu->arch.gpr[rs]);
+		/* BAT writes happen so rarely that we're ok to flush
+		 * everything here */
+		kvmppc_mmu_pte_flush(vcpu, 0, 0);
+		break;
+	case SPRN_HID0:
+		to_book3s(vcpu)->hid[0] = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HID1:
+		to_book3s(vcpu)->hid[1] = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HID2:
+		to_book3s(vcpu)->hid[2] = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HID4:
+		to_book3s(vcpu)->hid[4] = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HID5:
+		to_book3s(vcpu)->hid[5] = vcpu->arch.gpr[rs];
+		/* guest HID5 set can change is_dcbz32 */
+		if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+		    (mfmsr() & MSR_HV))
+			vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
+		break;
+	case SPRN_ICTC:
+	case SPRN_THRM1:
+	case SPRN_THRM2:
+	case SPRN_THRM3:
+	case SPRN_CTRLF:
+	case SPRN_CTRLT:
+		break;
+	default:
+		printk(KERN_INFO "KVM: invalid SPR write: %d\n", sprn);
+#ifndef DEBUG_SPR
+		emulated = EMULATE_FAIL;
+#endif
+		break;
+	}
+
+	return emulated;
+}
+
+int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt)
+{
+	int emulated = EMULATE_DONE;
+
+	switch (sprn) {
+	case SPRN_SDR1:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->sdr1;
+		break;
+	case SPRN_DSISR:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->dsisr;
+		break;
+	case SPRN_DAR:
+		vcpu->arch.gpr[rt] = vcpu->arch.dear;
+		break;
+	case SPRN_HIOR:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hior;
+		break;
+	case SPRN_HID0:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[0];
+		break;
+	case SPRN_HID1:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[1];
+		break;
+	case SPRN_HID2:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[2];
+		break;
+	case SPRN_HID4:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[4];
+		break;
+	case SPRN_HID5:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[5];
+		break;
+	case SPRN_THRM1:
+	case SPRN_THRM2:
+	case SPRN_THRM3:
+	case SPRN_CTRLF:
+	case SPRN_CTRLT:
+		vcpu->arch.gpr[rt] = 0;
+		break;
+	default:
+		printk(KERN_INFO "KVM: invalid SPR read: %d\n", sprn);
+#ifndef DEBUG_SPR
+		emulated = EMULATE_FAIL;
+#endif
+		break;
+	}
+
+	return emulated;
+}
+
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 14/27] Add book3s_64 specific opcode emulation
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

There are generic parts of PowerPC that can be shared across all
implementations and specific parts that only apply to BookE or desktop PPCs.

This patch adds emulation for desktop specific opcodes that don't apply
to BookE CPUs.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v5 -> v6:

  - // -> /* */
---
 arch/powerpc/kvm/book3s_64_emulate.c |  337 ++++++++++++++++++++++++++++++++++
 1 files changed, 337 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_emulate.c

diff --git a/arch/powerpc/kvm/book3s_64_emulate.c b/arch/powerpc/kvm/book3s_64_emulate.c
new file mode 100644
index 0000000..c343e67
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_emulate.c
@@ -0,0 +1,337 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <asm/kvm_ppc.h>
+#include <asm/disassemble.h>
+#include <asm/kvm_book3s.h>
+#include <asm/reg.h>
+
+#define OP_19_XOP_RFID		18
+#define OP_19_XOP_RFI		50
+
+#define OP_31_XOP_MFMSR		83
+#define OP_31_XOP_MTMSR		146
+#define OP_31_XOP_MTMSRD	178
+#define OP_31_XOP_MTSRIN	242
+#define OP_31_XOP_TLBIEL	274
+#define OP_31_XOP_TLBIE		306
+#define OP_31_XOP_SLBMTE	402
+#define OP_31_XOP_SLBIE		434
+#define OP_31_XOP_SLBIA		498
+#define OP_31_XOP_MFSRIN	659
+#define OP_31_XOP_SLBMFEV	851
+#define OP_31_XOP_EIOIO		854
+#define OP_31_XOP_SLBMFEE	915
+
+/* DCBZ is actually 1014, but we patch it to 1010 so we get a trap */
+#define OP_31_XOP_DCBZ		1010
+
+int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
+                           unsigned int inst, int *advance)
+{
+	int emulated = EMULATE_DONE;
+
+	switch (get_op(inst)) {
+	case 19:
+		switch (get_xop(inst)) {
+		case OP_19_XOP_RFID:
+		case OP_19_XOP_RFI:
+			vcpu->arch.pc = vcpu->arch.srr0;
+			kvmppc_set_msr(vcpu, vcpu->arch.srr1);
+			*advance = 0;
+			break;
+
+		default:
+			emulated = EMULATE_FAIL;
+			break;
+		}
+		break;
+	case 31:
+		switch (get_xop(inst)) {
+		case OP_31_XOP_MFMSR:
+			vcpu->arch.gpr[get_rt(inst)] = vcpu->arch.msr;
+			break;
+		case OP_31_XOP_MTMSRD:
+		{
+			ulong rs = vcpu->arch.gpr[get_rs(inst)];
+			if (inst & 0x10000) {
+				vcpu->arch.msr &= ~(MSR_RI | MSR_EE);
+				vcpu->arch.msr |= rs & (MSR_RI | MSR_EE);
+			} else
+				kvmppc_set_msr(vcpu, rs);
+			break;
+		}
+		case OP_31_XOP_MTMSR:
+			kvmppc_set_msr(vcpu, vcpu->arch.gpr[get_rs(inst)]);
+			break;
+		case OP_31_XOP_MFSRIN:
+		{
+			int srnum;
+
+			srnum = (vcpu->arch.gpr[get_rb(inst)] >> 28) & 0xf;
+			if (vcpu->arch.mmu.mfsrin) {
+				u32 sr;
+				sr = vcpu->arch.mmu.mfsrin(vcpu, srnum);
+				vcpu->arch.gpr[get_rt(inst)] = sr;
+			}
+			break;
+		}
+		case OP_31_XOP_MTSRIN:
+			vcpu->arch.mmu.mtsrin(vcpu,
+				(vcpu->arch.gpr[get_rb(inst)] >> 28) & 0xf,
+				vcpu->arch.gpr[get_rs(inst)]);
+			break;
+		case OP_31_XOP_TLBIE:
+		case OP_31_XOP_TLBIEL:
+		{
+			bool large = (inst & 0x00200000) ? true : false;
+			ulong addr = vcpu->arch.gpr[get_rb(inst)];
+			vcpu->arch.mmu.tlbie(vcpu, addr, large);
+			break;
+		}
+		case OP_31_XOP_EIOIO:
+			break;
+		case OP_31_XOP_SLBMTE:
+			if (!vcpu->arch.mmu.slbmte)
+				return EMULATE_FAIL;
+
+			vcpu->arch.mmu.slbmte(vcpu, vcpu->arch.gpr[get_rs(inst)],
+						vcpu->arch.gpr[get_rb(inst)]);
+			break;
+		case OP_31_XOP_SLBIE:
+			if (!vcpu->arch.mmu.slbie)
+				return EMULATE_FAIL;
+
+			vcpu->arch.mmu.slbie(vcpu, vcpu->arch.gpr[get_rb(inst)]);
+			break;
+		case OP_31_XOP_SLBIA:
+			if (!vcpu->arch.mmu.slbia)
+				return EMULATE_FAIL;
+
+			vcpu->arch.mmu.slbia(vcpu);
+			break;
+		case OP_31_XOP_SLBMFEE:
+			if (!vcpu->arch.mmu.slbmfee) {
+				emulated = EMULATE_FAIL;
+			} else {
+				ulong t, rb;
+
+				rb = vcpu->arch.gpr[get_rb(inst)];
+				t = vcpu->arch.mmu.slbmfee(vcpu, rb);
+				vcpu->arch.gpr[get_rt(inst)] = t;
+			}
+			break;
+		case OP_31_XOP_SLBMFEV:
+			if (!vcpu->arch.mmu.slbmfev) {
+				emulated = EMULATE_FAIL;
+			} else {
+				ulong t, rb;
+
+				rb = vcpu->arch.gpr[get_rb(inst)];
+				t = vcpu->arch.mmu.slbmfev(vcpu, rb);
+				vcpu->arch.gpr[get_rt(inst)] = t;
+			}
+			break;
+		case OP_31_XOP_DCBZ:
+		{
+			ulong rb =  vcpu->arch.gpr[get_rb(inst)];
+			ulong ra = 0;
+			ulong addr;
+			u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
+
+			if (get_ra(inst))
+				ra = vcpu->arch.gpr[get_ra(inst)];
+
+			addr = (ra + rb) & ~31ULL;
+			if (!(vcpu->arch.msr & MSR_SF))
+				addr &= 0xffffffff;
+
+			if (kvmppc_st(vcpu, addr, 32, zeros)) {
+				vcpu->arch.dear = addr;
+				vcpu->arch.fault_dear = addr;
+				to_book3s(vcpu)->dsisr = DSISR_PROTFAULT |
+						      DSISR_ISSTORE;
+				kvmppc_book3s_queue_irqprio(vcpu,
+					BOOK3S_INTERRUPT_DATA_STORAGE);
+				kvmppc_mmu_pte_flush(vcpu, addr, ~0xFFFULL);
+			}
+
+			break;
+		}
+		default:
+			emulated = EMULATE_FAIL;
+		}
+		break;
+	default:
+		emulated = EMULATE_FAIL;
+	}
+
+	return emulated;
+}
+
+static void kvmppc_write_bat(struct kvm_vcpu *vcpu, int sprn, u64 val)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_bat *bat;
+
+	switch (sprn) {
+	case SPRN_IBAT0U ... SPRN_IBAT3L:
+		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT0U) / 2];
+		break;
+	case SPRN_IBAT4U ... SPRN_IBAT7L:
+		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT4U) / 2];
+		break;
+	case SPRN_DBAT0U ... SPRN_DBAT3L:
+		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT0U) / 2];
+		break;
+	case SPRN_DBAT4U ... SPRN_DBAT7L:
+		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT4U) / 2];
+		break;
+	default:
+		BUG();
+	}
+
+	if (!(sprn % 2)) {
+		/* Upper BAT */
+		u32 bl = (val >> 2) & 0x7ff;
+		bat->bepi_mask = (~bl << 17);
+		bat->bepi = val & 0xfffe0000;
+		bat->vs = (val & 2) ? 1 : 0;
+		bat->vp = (val & 1) ? 1 : 0;
+	} else {
+		/* Lower BAT */
+		bat->brpn = val & 0xfffe0000;
+		bat->wimg = (val >> 3) & 0xf;
+		bat->pp = val & 3;
+	}
+}
+
+int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs)
+{
+	int emulated = EMULATE_DONE;
+
+	switch (sprn) {
+	case SPRN_SDR1:
+		to_book3s(vcpu)->sdr1 = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_DSISR:
+		to_book3s(vcpu)->dsisr = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_DAR:
+		vcpu->arch.dear = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HIOR:
+		to_book3s(vcpu)->hior = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_IBAT0U ... SPRN_IBAT3L:
+	case SPRN_IBAT4U ... SPRN_IBAT7L:
+	case SPRN_DBAT0U ... SPRN_DBAT3L:
+	case SPRN_DBAT4U ... SPRN_DBAT7L:
+		kvmppc_write_bat(vcpu, sprn, vcpu->arch.gpr[rs]);
+		/* BAT writes happen so rarely that we're ok to flush
+		 * everything here */
+		kvmppc_mmu_pte_flush(vcpu, 0, 0);
+		break;
+	case SPRN_HID0:
+		to_book3s(vcpu)->hid[0] = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HID1:
+		to_book3s(vcpu)->hid[1] = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HID2:
+		to_book3s(vcpu)->hid[2] = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HID4:
+		to_book3s(vcpu)->hid[4] = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HID5:
+		to_book3s(vcpu)->hid[5] = vcpu->arch.gpr[rs];
+		/* guest HID5 set can change is_dcbz32 */
+		if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+		    (mfmsr() & MSR_HV))
+			vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
+		break;
+	case SPRN_ICTC:
+	case SPRN_THRM1:
+	case SPRN_THRM2:
+	case SPRN_THRM3:
+	case SPRN_CTRLF:
+	case SPRN_CTRLT:
+		break;
+	default:
+		printk(KERN_INFO "KVM: invalid SPR write: %d\n", sprn);
+#ifndef DEBUG_SPR
+		emulated = EMULATE_FAIL;
+#endif
+		break;
+	}
+
+	return emulated;
+}
+
+int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt)
+{
+	int emulated = EMULATE_DONE;
+
+	switch (sprn) {
+	case SPRN_SDR1:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->sdr1;
+		break;
+	case SPRN_DSISR:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->dsisr;
+		break;
+	case SPRN_DAR:
+		vcpu->arch.gpr[rt] = vcpu->arch.dear;
+		break;
+	case SPRN_HIOR:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hior;
+		break;
+	case SPRN_HID0:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[0];
+		break;
+	case SPRN_HID1:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[1];
+		break;
+	case SPRN_HID2:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[2];
+		break;
+	case SPRN_HID4:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[4];
+		break;
+	case SPRN_HID5:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[5];
+		break;
+	case SPRN_THRM1:
+	case SPRN_THRM2:
+	case SPRN_THRM3:
+	case SPRN_CTRLF:
+	case SPRN_CTRLT:
+		vcpu->arch.gpr[rt] = 0;
+		break;
+	default:
+		printk(KERN_INFO "KVM: invalid SPR read: %d\n", sprn);
+#ifndef DEBUG_SPR
+		emulated = EMULATE_FAIL;
+#endif
+		break;
+	}
+
+	return emulated;
+}
+
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 14/27] Add book3s_64 specific opcode emulation
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

There are generic parts of PowerPC that can be shared across all
implementations and specific parts that only apply to BookE or desktop PPCs.

This patch adds emulation for desktop specific opcodes that don't apply
to BookE CPUs.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v5 -> v6:

  - // -> /* */
---
 arch/powerpc/kvm/book3s_64_emulate.c |  337 ++++++++++++++++++++++++++++++++++
 1 files changed, 337 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_emulate.c

diff --git a/arch/powerpc/kvm/book3s_64_emulate.c b/arch/powerpc/kvm/book3s_64_emulate.c
new file mode 100644
index 0000000..c343e67
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_emulate.c
@@ -0,0 +1,337 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <asm/kvm_ppc.h>
+#include <asm/disassemble.h>
+#include <asm/kvm_book3s.h>
+#include <asm/reg.h>
+
+#define OP_19_XOP_RFID		18
+#define OP_19_XOP_RFI		50
+
+#define OP_31_XOP_MFMSR		83
+#define OP_31_XOP_MTMSR		146
+#define OP_31_XOP_MTMSRD	178
+#define OP_31_XOP_MTSRIN	242
+#define OP_31_XOP_TLBIEL	274
+#define OP_31_XOP_TLBIE		306
+#define OP_31_XOP_SLBMTE	402
+#define OP_31_XOP_SLBIE		434
+#define OP_31_XOP_SLBIA		498
+#define OP_31_XOP_MFSRIN	659
+#define OP_31_XOP_SLBMFEV	851
+#define OP_31_XOP_EIOIO		854
+#define OP_31_XOP_SLBMFEE	915
+
+/* DCBZ is actually 1014, but we patch it to 1010 so we get a trap */
+#define OP_31_XOP_DCBZ		1010
+
+int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
+                           unsigned int inst, int *advance)
+{
+	int emulated = EMULATE_DONE;
+
+	switch (get_op(inst)) {
+	case 19:
+		switch (get_xop(inst)) {
+		case OP_19_XOP_RFID:
+		case OP_19_XOP_RFI:
+			vcpu->arch.pc = vcpu->arch.srr0;
+			kvmppc_set_msr(vcpu, vcpu->arch.srr1);
+			*advance = 0;
+			break;
+
+		default:
+			emulated = EMULATE_FAIL;
+			break;
+		}
+		break;
+	case 31:
+		switch (get_xop(inst)) {
+		case OP_31_XOP_MFMSR:
+			vcpu->arch.gpr[get_rt(inst)] = vcpu->arch.msr;
+			break;
+		case OP_31_XOP_MTMSRD:
+		{
+			ulong rs = vcpu->arch.gpr[get_rs(inst)];
+			if (inst & 0x10000) {
+				vcpu->arch.msr &= ~(MSR_RI | MSR_EE);
+				vcpu->arch.msr |= rs & (MSR_RI | MSR_EE);
+			} else
+				kvmppc_set_msr(vcpu, rs);
+			break;
+		}
+		case OP_31_XOP_MTMSR:
+			kvmppc_set_msr(vcpu, vcpu->arch.gpr[get_rs(inst)]);
+			break;
+		case OP_31_XOP_MFSRIN:
+		{
+			int srnum;
+
+			srnum = (vcpu->arch.gpr[get_rb(inst)] >> 28) & 0xf;
+			if (vcpu->arch.mmu.mfsrin) {
+				u32 sr;
+				sr = vcpu->arch.mmu.mfsrin(vcpu, srnum);
+				vcpu->arch.gpr[get_rt(inst)] = sr;
+			}
+			break;
+		}
+		case OP_31_XOP_MTSRIN:
+			vcpu->arch.mmu.mtsrin(vcpu,
+				(vcpu->arch.gpr[get_rb(inst)] >> 28) & 0xf,
+				vcpu->arch.gpr[get_rs(inst)]);
+			break;
+		case OP_31_XOP_TLBIE:
+		case OP_31_XOP_TLBIEL:
+		{
+			bool large = (inst & 0x00200000) ? true : false;
+			ulong addr = vcpu->arch.gpr[get_rb(inst)];
+			vcpu->arch.mmu.tlbie(vcpu, addr, large);
+			break;
+		}
+		case OP_31_XOP_EIOIO:
+			break;
+		case OP_31_XOP_SLBMTE:
+			if (!vcpu->arch.mmu.slbmte)
+				return EMULATE_FAIL;
+
+			vcpu->arch.mmu.slbmte(vcpu, vcpu->arch.gpr[get_rs(inst)],
+						vcpu->arch.gpr[get_rb(inst)]);
+			break;
+		case OP_31_XOP_SLBIE:
+			if (!vcpu->arch.mmu.slbie)
+				return EMULATE_FAIL;
+
+			vcpu->arch.mmu.slbie(vcpu, vcpu->arch.gpr[get_rb(inst)]);
+			break;
+		case OP_31_XOP_SLBIA:
+			if (!vcpu->arch.mmu.slbia)
+				return EMULATE_FAIL;
+
+			vcpu->arch.mmu.slbia(vcpu);
+			break;
+		case OP_31_XOP_SLBMFEE:
+			if (!vcpu->arch.mmu.slbmfee) {
+				emulated = EMULATE_FAIL;
+			} else {
+				ulong t, rb;
+
+				rb = vcpu->arch.gpr[get_rb(inst)];
+				t = vcpu->arch.mmu.slbmfee(vcpu, rb);
+				vcpu->arch.gpr[get_rt(inst)] = t;
+			}
+			break;
+		case OP_31_XOP_SLBMFEV:
+			if (!vcpu->arch.mmu.slbmfev) {
+				emulated = EMULATE_FAIL;
+			} else {
+				ulong t, rb;
+
+				rb = vcpu->arch.gpr[get_rb(inst)];
+				t = vcpu->arch.mmu.slbmfev(vcpu, rb);
+				vcpu->arch.gpr[get_rt(inst)] = t;
+			}
+			break;
+		case OP_31_XOP_DCBZ:
+		{
+			ulong rb =  vcpu->arch.gpr[get_rb(inst)];
+			ulong ra = 0;
+			ulong addr;
+			u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
+
+			if (get_ra(inst))
+				ra = vcpu->arch.gpr[get_ra(inst)];
+
+			addr = (ra + rb) & ~31ULL;
+			if (!(vcpu->arch.msr & MSR_SF))
+				addr &= 0xffffffff;
+
+			if (kvmppc_st(vcpu, addr, 32, zeros)) {
+				vcpu->arch.dear = addr;
+				vcpu->arch.fault_dear = addr;
+				to_book3s(vcpu)->dsisr = DSISR_PROTFAULT |
+						      DSISR_ISSTORE;
+				kvmppc_book3s_queue_irqprio(vcpu,
+					BOOK3S_INTERRUPT_DATA_STORAGE);
+				kvmppc_mmu_pte_flush(vcpu, addr, ~0xFFFULL);
+			}
+
+			break;
+		}
+		default:
+			emulated = EMULATE_FAIL;
+		}
+		break;
+	default:
+		emulated = EMULATE_FAIL;
+	}
+
+	return emulated;
+}
+
+static void kvmppc_write_bat(struct kvm_vcpu *vcpu, int sprn, u64 val)
+{
+	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
+	struct kvmppc_bat *bat;
+
+	switch (sprn) {
+	case SPRN_IBAT0U ... SPRN_IBAT3L:
+		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT0U) / 2];
+		break;
+	case SPRN_IBAT4U ... SPRN_IBAT7L:
+		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT4U) / 2];
+		break;
+	case SPRN_DBAT0U ... SPRN_DBAT3L:
+		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT0U) / 2];
+		break;
+	case SPRN_DBAT4U ... SPRN_DBAT7L:
+		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT4U) / 2];
+		break;
+	default:
+		BUG();
+	}
+
+	if (!(sprn % 2)) {
+		/* Upper BAT */
+		u32 bl = (val >> 2) & 0x7ff;
+		bat->bepi_mask = (~bl << 17);
+		bat->bepi = val & 0xfffe0000;
+		bat->vs = (val & 2) ? 1 : 0;
+		bat->vp = (val & 1) ? 1 : 0;
+	} else {
+		/* Lower BAT */
+		bat->brpn = val & 0xfffe0000;
+		bat->wimg = (val >> 3) & 0xf;
+		bat->pp = val & 3;
+	}
+}
+
+int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs)
+{
+	int emulated = EMULATE_DONE;
+
+	switch (sprn) {
+	case SPRN_SDR1:
+		to_book3s(vcpu)->sdr1 = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_DSISR:
+		to_book3s(vcpu)->dsisr = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_DAR:
+		vcpu->arch.dear = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HIOR:
+		to_book3s(vcpu)->hior = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_IBAT0U ... SPRN_IBAT3L:
+	case SPRN_IBAT4U ... SPRN_IBAT7L:
+	case SPRN_DBAT0U ... SPRN_DBAT3L:
+	case SPRN_DBAT4U ... SPRN_DBAT7L:
+		kvmppc_write_bat(vcpu, sprn, vcpu->arch.gpr[rs]);
+		/* BAT writes happen so rarely that we're ok to flush
+		 * everything here */
+		kvmppc_mmu_pte_flush(vcpu, 0, 0);
+		break;
+	case SPRN_HID0:
+		to_book3s(vcpu)->hid[0] = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HID1:
+		to_book3s(vcpu)->hid[1] = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HID2:
+		to_book3s(vcpu)->hid[2] = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HID4:
+		to_book3s(vcpu)->hid[4] = vcpu->arch.gpr[rs];
+		break;
+	case SPRN_HID5:
+		to_book3s(vcpu)->hid[5] = vcpu->arch.gpr[rs];
+		/* guest HID5 set can change is_dcbz32 */
+		if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
+		    (mfmsr() & MSR_HV))
+			vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
+		break;
+	case SPRN_ICTC:
+	case SPRN_THRM1:
+	case SPRN_THRM2:
+	case SPRN_THRM3:
+	case SPRN_CTRLF:
+	case SPRN_CTRLT:
+		break;
+	default:
+		printk(KERN_INFO "KVM: invalid SPR write: %d\n", sprn);
+#ifndef DEBUG_SPR
+		emulated = EMULATE_FAIL;
+#endif
+		break;
+	}
+
+	return emulated;
+}
+
+int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt)
+{
+	int emulated = EMULATE_DONE;
+
+	switch (sprn) {
+	case SPRN_SDR1:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->sdr1;
+		break;
+	case SPRN_DSISR:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->dsisr;
+		break;
+	case SPRN_DAR:
+		vcpu->arch.gpr[rt] = vcpu->arch.dear;
+		break;
+	case SPRN_HIOR:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hior;
+		break;
+	case SPRN_HID0:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[0];
+		break;
+	case SPRN_HID1:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[1];
+		break;
+	case SPRN_HID2:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[2];
+		break;
+	case SPRN_HID4:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[4];
+		break;
+	case SPRN_HID5:
+		vcpu->arch.gpr[rt] = to_book3s(vcpu)->hid[5];
+		break;
+	case SPRN_THRM1:
+	case SPRN_THRM2:
+	case SPRN_THRM3:
+	case SPRN_CTRLF:
+	case SPRN_CTRLT:
+		vcpu->arch.gpr[rt] = 0;
+		break;
+	default:
+		printk(KERN_INFO "KVM: invalid SPR read: %d\n", sprn);
+#ifndef DEBUG_SPR
+		emulated = EMULATE_FAIL;
+#endif
+		break;
+	}
+
+	return emulated;
+}
+
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 15/27] Add mfdec emulation
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47     ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

We support setting the DEC to a certain value right now. Doing that basically
triggers the CPU local timer.

But there's also an mfdec command that enabled the OS to read the decrementor.

This is required at least by all desktop and server PowerPC Linux kernels. It
can't really hurt to allow embedded ones to do it as well though.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
---
 arch/powerpc/kvm/emulate.c |   13 ++++++++++++-
 1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 7737146..50d411d 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -66,12 +66,14 @@
 
 void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 {
+	unsigned long nr_jiffies;
+
 	if (vcpu->arch.tcr & TCR_DIE) {
 		/* The decrementer ticks at the same rate as the timebase, so
 		 * that's how we convert the guest DEC value to the number of
 		 * host ticks. */
-		unsigned long nr_jiffies;
 
+		vcpu->arch.dec_jiffies = mftb();
 		nr_jiffies = vcpu->arch.dec / tb_ticks_per_jiffy;
 		mod_timer(&vcpu->arch.dec_timer,
 		          get_jiffies_64() + nr_jiffies);
@@ -211,6 +213,15 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 			/* Note: SPRG4-7 are user-readable, so we don't get
 			 * a trap. */
 
+			case SPRN_DEC:
+			{
+				u64 jd = mftb() - vcpu->arch.dec_jiffies;
+				vcpu->arch.gpr[rt] = vcpu->arch.dec - jd;
+#ifdef DEBUG_EMUL
+				printk(KERN_INFO "mfDEC: %x - %llx = %lx\n", vcpu->arch.dec, jd, vcpu->arch.gpr[rt]);
+#endif
+				break;
+			}
 			default:
 				emulated = kvmppc_core_emulate_mfspr(vcpu, sprn, rt);
 				if (emulated == EMULATE_FAIL) {
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 15/27] Add mfdec emulation
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

We support setting the DEC to a certain value right now. Doing that basically
triggers the CPU local timer.

But there's also an mfdec command that enabled the OS to read the decrementor.

This is required at least by all desktop and server PowerPC Linux kernels. It
can't really hurt to allow embedded ones to do it as well though.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/emulate.c |   13 ++++++++++++-
 1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 7737146..50d411d 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -66,12 +66,14 @@
 
 void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 {
+	unsigned long nr_jiffies;
+
 	if (vcpu->arch.tcr & TCR_DIE) {
 		/* The decrementer ticks at the same rate as the timebase, so
 		 * that's how we convert the guest DEC value to the number of
 		 * host ticks. */
-		unsigned long nr_jiffies;
 
+		vcpu->arch.dec_jiffies = mftb();
 		nr_jiffies = vcpu->arch.dec / tb_ticks_per_jiffy;
 		mod_timer(&vcpu->arch.dec_timer,
 		          get_jiffies_64() + nr_jiffies);
@@ -211,6 +213,15 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 			/* Note: SPRG4-7 are user-readable, so we don't get
 			 * a trap. */
 
+			case SPRN_DEC:
+			{
+				u64 jd = mftb() - vcpu->arch.dec_jiffies;
+				vcpu->arch.gpr[rt] = vcpu->arch.dec - jd;
+#ifdef DEBUG_EMUL
+				printk(KERN_INFO "mfDEC: %x - %llx = %lx\n", vcpu->arch.dec, jd, vcpu->arch.gpr[rt]);
+#endif
+				break;
+			}
 			default:
 				emulated = kvmppc_core_emulate_mfspr(vcpu, sprn, rt);
 				if (emulated == EMULATE_FAIL) {
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 15/27] Add mfdec emulation
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

We support setting the DEC to a certain value right now. Doing that basically
triggers the CPU local timer.

But there's also an mfdec command that enabled the OS to read the decrementor.

This is required at least by all desktop and server PowerPC Linux kernels. It
can't really hurt to allow embedded ones to do it as well though.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/emulate.c |   13 ++++++++++++-
 1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 7737146..50d411d 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -66,12 +66,14 @@
 
 void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 {
+	unsigned long nr_jiffies;
+
 	if (vcpu->arch.tcr & TCR_DIE) {
 		/* The decrementer ticks at the same rate as the timebase, so
 		 * that's how we convert the guest DEC value to the number of
 		 * host ticks. */
-		unsigned long nr_jiffies;
 
+		vcpu->arch.dec_jiffies = mftb();
 		nr_jiffies = vcpu->arch.dec / tb_ticks_per_jiffy;
 		mod_timer(&vcpu->arch.dec_timer,
 		          get_jiffies_64() + nr_jiffies);
@@ -211,6 +213,15 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 			/* Note: SPRG4-7 are user-readable, so we don't get
 			 * a trap. */
 
+			case SPRN_DEC:
+			{
+				u64 jd = mftb() - vcpu->arch.dec_jiffies;
+				vcpu->arch.gpr[rt] = vcpu->arch.dec - jd;
+#ifdef DEBUG_EMUL
+				printk(KERN_INFO "mfDEC: %x - %llx = %lx\n", vcpu->arch.dec, jd, vcpu->arch.gpr[rt]);
+#endif
+				break;
+			}
 			default:
 				emulated = kvmppc_core_emulate_mfspr(vcpu, sprn, rt);
 				if (emulated = EMULATE_FAIL) {
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 16/27] Add desktop PowerPC specific emulation
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47     ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

Little opcodes behave differently on desktop and embedded PowerPC cores.
In order to reflect those differences, let's add some #ifdef code to emulate.c.

We could probably also handle them in the core specific emulation files, but I
would prefer to reuse as much code as possible.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>

---

v4 -> v5:

  - use get_tb instead of mftb
  - make ppc32 and ppc64 emulation share more code
---
 arch/powerpc/kvm/emulate.c |   49 +++++++++++++++++++++++++++++++++++---------
 1 files changed, 39 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 50d411d..1ec5e07 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -32,6 +32,7 @@
 #include "trace.h"
 
 #define OP_TRAP 3
+#define OP_TRAP_64 2
 
 #define OP_31_XOP_LWZX      23
 #define OP_31_XOP_LBZX      87
@@ -64,16 +65,36 @@
 #define OP_STH  44
 #define OP_STHU 45
 
+#ifdef CONFIG_PPC64
+static int kvmppc_dec_enabled(struct kvm_vcpu *vcpu)
+{
+	return 1;
+}
+#else
+static int kvmppc_dec_enabled(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.tcr & TCR_DIE;
+}
+#endif
+
 void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr_jiffies;
 
-	if (vcpu->arch.tcr & TCR_DIE) {
+#ifdef CONFIG_PPC64
+	/* POWER4+ triggers a dec interrupt if the value is < 0 */
+	if (vcpu->arch.dec & 0x80000000) {
+		del_timer(&vcpu->arch.dec_timer);
+		kvmppc_core_queue_dec(vcpu);
+		return;
+	}
+#endif
+	if (kvmppc_dec_enabled(vcpu)) {
 		/* The decrementer ticks at the same rate as the timebase, so
 		 * that's how we convert the guest DEC value to the number of
 		 * host ticks. */
 
-		vcpu->arch.dec_jiffies = mftb();
+		vcpu->arch.dec_jiffies = get_tb();
 		nr_jiffies = vcpu->arch.dec / tb_ticks_per_jiffy;
 		mod_timer(&vcpu->arch.dec_timer,
 		          get_jiffies_64() + nr_jiffies);
@@ -113,9 +134,15 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 	/* this default type might be overwritten by subcategories */
 	kvmppc_set_exit_type(vcpu, EMULATED_INST_EXITS);
 
+	pr_debug(KERN_INFO "Emulating opcode %d / %d\n", get_op(inst), get_xop(inst));
+
 	switch (get_op(inst)) {
 	case OP_TRAP:
+#ifdef CONFIG_PPC64
+	case OP_TRAP_64:
+#else
 		vcpu->arch.esr |= ESR_PTR;
+#endif
 		kvmppc_core_queue_program(vcpu);
 		advance = 0;
 		break;
@@ -190,17 +217,19 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 			case SPRN_SRR1:
 				vcpu->arch.gpr[rt] = vcpu->arch.srr1; break;
 			case SPRN_PVR:
-				vcpu->arch.gpr[rt] = mfspr(SPRN_PVR); break;
+				vcpu->arch.gpr[rt] = vcpu->arch.pvr; break;
 			case SPRN_PIR:
-				vcpu->arch.gpr[rt] = mfspr(SPRN_PIR); break;
+				vcpu->arch.gpr[rt] = vcpu->vcpu_id; break;
+			case SPRN_MSSSR0:
+				vcpu->arch.gpr[rt] = 0; break;
 
 			/* Note: mftb and TBRL/TBWL are user-accessible, so
 			 * the guest can always access the real TB anyways.
 			 * In fact, we probably will never see these traps. */
 			case SPRN_TBWL:
-				vcpu->arch.gpr[rt] = mftbl(); break;
+				vcpu->arch.gpr[rt] = get_tb() >> 32; break;
 			case SPRN_TBWU:
-				vcpu->arch.gpr[rt] = mftbu(); break;
+				vcpu->arch.gpr[rt] = get_tb(); break;
 
 			case SPRN_SPRG0:
 				vcpu->arch.gpr[rt] = vcpu->arch.sprg0; break;
@@ -215,11 +244,9 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 
 			case SPRN_DEC:
 			{
-				u64 jd = mftb() - vcpu->arch.dec_jiffies;
+				u64 jd = get_tb() - vcpu->arch.dec_jiffies;
 				vcpu->arch.gpr[rt] = vcpu->arch.dec - jd;
-#ifdef DEBUG_EMUL
-				printk(KERN_INFO "mfDEC: %x - %llx = %lx\n", vcpu->arch.dec, jd, vcpu->arch.gpr[rt]);
-#endif
+				pr_debug(KERN_INFO "mfDEC: %x - %llx = %lx\n", vcpu->arch.dec, jd, vcpu->arch.gpr[rt]);
 				break;
 			}
 			default:
@@ -271,6 +298,8 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 			case SPRN_TBWL: break;
 			case SPRN_TBWU: break;
 
+			case SPRN_MSSSR0: break;
+
 			case SPRN_DEC:
 				vcpu->arch.dec = vcpu->arch.gpr[rs];
 				kvmppc_emulate_dec(vcpu);
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 16/27] Add desktop PowerPC specific emulation
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

Little opcodes behave differently on desktop and embedded PowerPC cores.
In order to reflect those differences, let's add some #ifdef code to emulate.c.

We could probably also handle them in the core specific emulation files, but I
would prefer to reuse as much code as possible.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v4 -> v5:

  - use get_tb instead of mftb
  - make ppc32 and ppc64 emulation share more code
---
 arch/powerpc/kvm/emulate.c |   49 +++++++++++++++++++++++++++++++++++---------
 1 files changed, 39 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 50d411d..1ec5e07 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -32,6 +32,7 @@
 #include "trace.h"
 
 #define OP_TRAP 3
+#define OP_TRAP_64 2
 
 #define OP_31_XOP_LWZX      23
 #define OP_31_XOP_LBZX      87
@@ -64,16 +65,36 @@
 #define OP_STH  44
 #define OP_STHU 45
 
+#ifdef CONFIG_PPC64
+static int kvmppc_dec_enabled(struct kvm_vcpu *vcpu)
+{
+	return 1;
+}
+#else
+static int kvmppc_dec_enabled(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.tcr & TCR_DIE;
+}
+#endif
+
 void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr_jiffies;
 
-	if (vcpu->arch.tcr & TCR_DIE) {
+#ifdef CONFIG_PPC64
+	/* POWER4+ triggers a dec interrupt if the value is < 0 */
+	if (vcpu->arch.dec & 0x80000000) {
+		del_timer(&vcpu->arch.dec_timer);
+		kvmppc_core_queue_dec(vcpu);
+		return;
+	}
+#endif
+	if (kvmppc_dec_enabled(vcpu)) {
 		/* The decrementer ticks at the same rate as the timebase, so
 		 * that's how we convert the guest DEC value to the number of
 		 * host ticks. */
 
-		vcpu->arch.dec_jiffies = mftb();
+		vcpu->arch.dec_jiffies = get_tb();
 		nr_jiffies = vcpu->arch.dec / tb_ticks_per_jiffy;
 		mod_timer(&vcpu->arch.dec_timer,
 		          get_jiffies_64() + nr_jiffies);
@@ -113,9 +134,15 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 	/* this default type might be overwritten by subcategories */
 	kvmppc_set_exit_type(vcpu, EMULATED_INST_EXITS);
 
+	pr_debug(KERN_INFO "Emulating opcode %d / %d\n", get_op(inst), get_xop(inst));
+
 	switch (get_op(inst)) {
 	case OP_TRAP:
+#ifdef CONFIG_PPC64
+	case OP_TRAP_64:
+#else
 		vcpu->arch.esr |= ESR_PTR;
+#endif
 		kvmppc_core_queue_program(vcpu);
 		advance = 0;
 		break;
@@ -190,17 +217,19 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 			case SPRN_SRR1:
 				vcpu->arch.gpr[rt] = vcpu->arch.srr1; break;
 			case SPRN_PVR:
-				vcpu->arch.gpr[rt] = mfspr(SPRN_PVR); break;
+				vcpu->arch.gpr[rt] = vcpu->arch.pvr; break;
 			case SPRN_PIR:
-				vcpu->arch.gpr[rt] = mfspr(SPRN_PIR); break;
+				vcpu->arch.gpr[rt] = vcpu->vcpu_id; break;
+			case SPRN_MSSSR0:
+				vcpu->arch.gpr[rt] = 0; break;
 
 			/* Note: mftb and TBRL/TBWL are user-accessible, so
 			 * the guest can always access the real TB anyways.
 			 * In fact, we probably will never see these traps. */
 			case SPRN_TBWL:
-				vcpu->arch.gpr[rt] = mftbl(); break;
+				vcpu->arch.gpr[rt] = get_tb() >> 32; break;
 			case SPRN_TBWU:
-				vcpu->arch.gpr[rt] = mftbu(); break;
+				vcpu->arch.gpr[rt] = get_tb(); break;
 
 			case SPRN_SPRG0:
 				vcpu->arch.gpr[rt] = vcpu->arch.sprg0; break;
@@ -215,11 +244,9 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 
 			case SPRN_DEC:
 			{
-				u64 jd = mftb() - vcpu->arch.dec_jiffies;
+				u64 jd = get_tb() - vcpu->arch.dec_jiffies;
 				vcpu->arch.gpr[rt] = vcpu->arch.dec - jd;
-#ifdef DEBUG_EMUL
-				printk(KERN_INFO "mfDEC: %x - %llx = %lx\n", vcpu->arch.dec, jd, vcpu->arch.gpr[rt]);
-#endif
+				pr_debug(KERN_INFO "mfDEC: %x - %llx = %lx\n", vcpu->arch.dec, jd, vcpu->arch.gpr[rt]);
 				break;
 			}
 			default:
@@ -271,6 +298,8 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 			case SPRN_TBWL: break;
 			case SPRN_TBWU: break;
 
+			case SPRN_MSSSR0: break;
+
 			case SPRN_DEC:
 				vcpu->arch.dec = vcpu->arch.gpr[rs];
 				kvmppc_emulate_dec(vcpu);
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 16/27] Add desktop PowerPC specific emulation
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

Little opcodes behave differently on desktop and embedded PowerPC cores.
In order to reflect those differences, let's add some #ifdef code to emulate.c.

We could probably also handle them in the core specific emulation files, but I
would prefer to reuse as much code as possible.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v4 -> v5:

  - use get_tb instead of mftb
  - make ppc32 and ppc64 emulation share more code
---
 arch/powerpc/kvm/emulate.c |   49 +++++++++++++++++++++++++++++++++++---------
 1 files changed, 39 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 50d411d..1ec5e07 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -32,6 +32,7 @@
 #include "trace.h"
 
 #define OP_TRAP 3
+#define OP_TRAP_64 2
 
 #define OP_31_XOP_LWZX      23
 #define OP_31_XOP_LBZX      87
@@ -64,16 +65,36 @@
 #define OP_STH  44
 #define OP_STHU 45
 
+#ifdef CONFIG_PPC64
+static int kvmppc_dec_enabled(struct kvm_vcpu *vcpu)
+{
+	return 1;
+}
+#else
+static int kvmppc_dec_enabled(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.tcr & TCR_DIE;
+}
+#endif
+
 void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr_jiffies;
 
-	if (vcpu->arch.tcr & TCR_DIE) {
+#ifdef CONFIG_PPC64
+	/* POWER4+ triggers a dec interrupt if the value is < 0 */
+	if (vcpu->arch.dec & 0x80000000) {
+		del_timer(&vcpu->arch.dec_timer);
+		kvmppc_core_queue_dec(vcpu);
+		return;
+	}
+#endif
+	if (kvmppc_dec_enabled(vcpu)) {
 		/* The decrementer ticks at the same rate as the timebase, so
 		 * that's how we convert the guest DEC value to the number of
 		 * host ticks. */
 
-		vcpu->arch.dec_jiffies = mftb();
+		vcpu->arch.dec_jiffies = get_tb();
 		nr_jiffies = vcpu->arch.dec / tb_ticks_per_jiffy;
 		mod_timer(&vcpu->arch.dec_timer,
 		          get_jiffies_64() + nr_jiffies);
@@ -113,9 +134,15 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 	/* this default type might be overwritten by subcategories */
 	kvmppc_set_exit_type(vcpu, EMULATED_INST_EXITS);
 
+	pr_debug(KERN_INFO "Emulating opcode %d / %d\n", get_op(inst), get_xop(inst));
+
 	switch (get_op(inst)) {
 	case OP_TRAP:
+#ifdef CONFIG_PPC64
+	case OP_TRAP_64:
+#else
 		vcpu->arch.esr |= ESR_PTR;
+#endif
 		kvmppc_core_queue_program(vcpu);
 		advance = 0;
 		break;
@@ -190,17 +217,19 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 			case SPRN_SRR1:
 				vcpu->arch.gpr[rt] = vcpu->arch.srr1; break;
 			case SPRN_PVR:
-				vcpu->arch.gpr[rt] = mfspr(SPRN_PVR); break;
+				vcpu->arch.gpr[rt] = vcpu->arch.pvr; break;
 			case SPRN_PIR:
-				vcpu->arch.gpr[rt] = mfspr(SPRN_PIR); break;
+				vcpu->arch.gpr[rt] = vcpu->vcpu_id; break;
+			case SPRN_MSSSR0:
+				vcpu->arch.gpr[rt] = 0; break;
 
 			/* Note: mftb and TBRL/TBWL are user-accessible, so
 			 * the guest can always access the real TB anyways.
 			 * In fact, we probably will never see these traps. */
 			case SPRN_TBWL:
-				vcpu->arch.gpr[rt] = mftbl(); break;
+				vcpu->arch.gpr[rt] = get_tb() >> 32; break;
 			case SPRN_TBWU:
-				vcpu->arch.gpr[rt] = mftbu(); break;
+				vcpu->arch.gpr[rt] = get_tb(); break;
 
 			case SPRN_SPRG0:
 				vcpu->arch.gpr[rt] = vcpu->arch.sprg0; break;
@@ -215,11 +244,9 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 
 			case SPRN_DEC:
 			{
-				u64 jd = mftb() - vcpu->arch.dec_jiffies;
+				u64 jd = get_tb() - vcpu->arch.dec_jiffies;
 				vcpu->arch.gpr[rt] = vcpu->arch.dec - jd;
-#ifdef DEBUG_EMUL
-				printk(KERN_INFO "mfDEC: %x - %llx = %lx\n", vcpu->arch.dec, jd, vcpu->arch.gpr[rt]);
-#endif
+				pr_debug(KERN_INFO "mfDEC: %x - %llx = %lx\n", vcpu->arch.dec, jd, vcpu->arch.gpr[rt]);
 				break;
 			}
 			default:
@@ -271,6 +298,8 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 			case SPRN_TBWL: break;
 			case SPRN_TBWU: break;
 
+			case SPRN_MSSSR0: break;
+
 			case SPRN_DEC:
 				vcpu->arch.dec = vcpu->arch.gpr[rs];
 				kvmppc_emulate_dec(vcpu);
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 17/27] Make head_64.S aware of KVM real mode code
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47   ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

We need to run some KVM trampoline code in real mode. Unfortunately, real mode
only covers 8MB on Cell so we need to squeeze ourselves as low as possible.

Also, we need to trap interrupts to get us back from guest state to host state
without telling Linux about it.

This patch adds interrupt traps and includes the KVM code that requires real
mode in the real mode parts of Linux.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/exception-64s.h |    2 ++
 arch/powerpc/kernel/exceptions-64s.S     |    8 ++++++++
 arch/powerpc/kernel/head_64.S            |    7 +++++++
 3 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index a98653b..57c4000 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -147,6 +147,7 @@
 	.globl label##_pSeries;				\
 label##_pSeries:					\
 	HMT_MEDIUM;					\
+	DO_KVM	n;					\
 	mtspr	SPRN_SPRG_SCRATCH0,r13;		/* save r13 */	\
 	EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common)
 
@@ -170,6 +171,7 @@ label##_pSeries:					\
 	.globl label##_pSeries;						\
 label##_pSeries:							\
 	HMT_MEDIUM;							\
+	DO_KVM	n;							\
 	mtspr	SPRN_SPRG_SCRATCH0,r13;	/* save r13 */			\
 	mfspr	r13,SPRN_SPRG_PACA;	/* get paca address into r13 */	\
 	std	r9,PACA_EXGEN+EX_R9(r13);	/* save r9, r10 */	\
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 1808876..fc3ead0 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -41,6 +41,7 @@ __start_interrupts:
 	. = 0x200
 _machine_check_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0x200
 	mtspr	SPRN_SPRG_SCRATCH0,r13		/* save r13 */
 	EXCEPTION_PROLOG_PSERIES(PACA_EXMC, machine_check_common)
 
@@ -48,6 +49,7 @@ _machine_check_pSeries:
 	.globl data_access_pSeries
 data_access_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0x300
 	mtspr	SPRN_SPRG_SCRATCH0,r13
 BEGIN_FTR_SECTION
 	mfspr	r13,SPRN_SPRG_PACA
@@ -77,6 +79,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_SLB)
 	.globl data_access_slb_pSeries
 data_access_slb_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0x380
 	mtspr	SPRN_SPRG_SCRATCH0,r13
 	mfspr	r13,SPRN_SPRG_PACA		/* get paca address into r13 */
 	std	r3,PACA_EXSLB+EX_R3(r13)
@@ -115,6 +118,7 @@ data_access_slb_pSeries:
 	.globl instruction_access_slb_pSeries
 instruction_access_slb_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0x480
 	mtspr	SPRN_SPRG_SCRATCH0,r13
 	mfspr	r13,SPRN_SPRG_PACA		/* get paca address into r13 */
 	std	r3,PACA_EXSLB+EX_R3(r13)
@@ -154,6 +158,7 @@ instruction_access_slb_pSeries:
 	.globl	system_call_pSeries
 system_call_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0xc00
 BEGIN_FTR_SECTION
 	cmpdi	r0,0x1ebe
 	beq-	1f
@@ -186,12 +191,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	 * trickery is thus necessary
 	 */
 	. = 0xf00
+	DO_KVM	0xf00
 	b	performance_monitor_pSeries
 
 	. = 0xf20
+	DO_KVM	0xf20
 	b	altivec_unavailable_pSeries
 
 	. = 0xf40
+	DO_KVM	0xf40
 	b	vsx_unavailable_pSeries
 
 #ifdef CONFIG_CBE_RAS
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index c38afdb..9258074 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -37,6 +37,7 @@
 #include <asm/firmware.h>
 #include <asm/page_64.h>
 #include <asm/irqflags.h>
+#include <asm/kvm_book3s_64_asm.h>
 
 /* The physical memory is layed out such that the secondary processor
  * spin code sits at 0x0000...0x00ff. On server, the vectors follow
@@ -165,6 +166,12 @@ exception_marker:
 #include "exceptions-64s.S"
 #endif
 
+/* KVM trampoline code needs to be close to the interrupt handlers */
+
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+#include "../kvm/book3s_64_rmhandlers.S"
+#endif
+
 _GLOBAL(generic_secondary_thread_init)
 	mr	r24,r3
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 17/27] Make head_64.S aware of KVM real mode code
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

We need to run some KVM trampoline code in real mode. Unfortunately, real mode
only covers 8MB on Cell so we need to squeeze ourselves as low as possible.

Also, we need to trap interrupts to get us back from guest state to host state
without telling Linux about it.

This patch adds interrupt traps and includes the KVM code that requires real
mode in the real mode parts of Linux.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/exception-64s.h |    2 ++
 arch/powerpc/kernel/exceptions-64s.S     |    8 ++++++++
 arch/powerpc/kernel/head_64.S            |    7 +++++++
 3 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index a98653b..57c4000 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -147,6 +147,7 @@
 	.globl label##_pSeries;				\
 label##_pSeries:					\
 	HMT_MEDIUM;					\
+	DO_KVM	n;					\
 	mtspr	SPRN_SPRG_SCRATCH0,r13;		/* save r13 */	\
 	EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common)
 
@@ -170,6 +171,7 @@ label##_pSeries:					\
 	.globl label##_pSeries;						\
 label##_pSeries:							\
 	HMT_MEDIUM;							\
+	DO_KVM	n;							\
 	mtspr	SPRN_SPRG_SCRATCH0,r13;	/* save r13 */			\
 	mfspr	r13,SPRN_SPRG_PACA;	/* get paca address into r13 */	\
 	std	r9,PACA_EXGEN+EX_R9(r13);	/* save r9, r10 */	\
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 1808876..fc3ead0 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -41,6 +41,7 @@ __start_interrupts:
 	. = 0x200
 _machine_check_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0x200
 	mtspr	SPRN_SPRG_SCRATCH0,r13		/* save r13 */
 	EXCEPTION_PROLOG_PSERIES(PACA_EXMC, machine_check_common)
 
@@ -48,6 +49,7 @@ _machine_check_pSeries:
 	.globl data_access_pSeries
 data_access_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0x300
 	mtspr	SPRN_SPRG_SCRATCH0,r13
 BEGIN_FTR_SECTION
 	mfspr	r13,SPRN_SPRG_PACA
@@ -77,6 +79,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_SLB)
 	.globl data_access_slb_pSeries
 data_access_slb_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0x380
 	mtspr	SPRN_SPRG_SCRATCH0,r13
 	mfspr	r13,SPRN_SPRG_PACA		/* get paca address into r13 */
 	std	r3,PACA_EXSLB+EX_R3(r13)
@@ -115,6 +118,7 @@ data_access_slb_pSeries:
 	.globl instruction_access_slb_pSeries
 instruction_access_slb_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0x480
 	mtspr	SPRN_SPRG_SCRATCH0,r13
 	mfspr	r13,SPRN_SPRG_PACA		/* get paca address into r13 */
 	std	r3,PACA_EXSLB+EX_R3(r13)
@@ -154,6 +158,7 @@ instruction_access_slb_pSeries:
 	.globl	system_call_pSeries
 system_call_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0xc00
 BEGIN_FTR_SECTION
 	cmpdi	r0,0x1ebe
 	beq-	1f
@@ -186,12 +191,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	 * trickery is thus necessary
 	 */
 	. = 0xf00
+	DO_KVM	0xf00
 	b	performance_monitor_pSeries
 
 	. = 0xf20
+	DO_KVM	0xf20
 	b	altivec_unavailable_pSeries
 
 	. = 0xf40
+	DO_KVM	0xf40
 	b	vsx_unavailable_pSeries
 
 #ifdef CONFIG_CBE_RAS
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index c38afdb..9258074 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -37,6 +37,7 @@
 #include <asm/firmware.h>
 #include <asm/page_64.h>
 #include <asm/irqflags.h>
+#include <asm/kvm_book3s_64_asm.h>
 
 /* The physical memory is layed out such that the secondary processor
  * spin code sits at 0x0000...0x00ff. On server, the vectors follow
@@ -165,6 +166,12 @@ exception_marker:
 #include "exceptions-64s.S"
 #endif
 
+/* KVM trampoline code needs to be close to the interrupt handlers */
+
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+#include "../kvm/book3s_64_rmhandlers.S"
+#endif
+
 _GLOBAL(generic_secondary_thread_init)
 	mr	r24,r3
 
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 17/27] Make head_64.S aware of KVM real mode code
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

We need to run some KVM trampoline code in real mode. Unfortunately, real mode
only covers 8MB on Cell so we need to squeeze ourselves as low as possible.

Also, we need to trap interrupts to get us back from guest state to host state
without telling Linux about it.

This patch adds interrupt traps and includes the KVM code that requires real
mode in the real mode parts of Linux.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/exception-64s.h |    2 ++
 arch/powerpc/kernel/exceptions-64s.S     |    8 ++++++++
 arch/powerpc/kernel/head_64.S            |    7 +++++++
 3 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
index a98653b..57c4000 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -147,6 +147,7 @@
 	.globl label##_pSeries;				\
 label##_pSeries:					\
 	HMT_MEDIUM;					\
+	DO_KVM	n;					\
 	mtspr	SPRN_SPRG_SCRATCH0,r13;		/* save r13 */	\
 	EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, label##_common)
 
@@ -170,6 +171,7 @@ label##_pSeries:					\
 	.globl label##_pSeries;						\
 label##_pSeries:							\
 	HMT_MEDIUM;							\
+	DO_KVM	n;							\
 	mtspr	SPRN_SPRG_SCRATCH0,r13;	/* save r13 */			\
 	mfspr	r13,SPRN_SPRG_PACA;	/* get paca address into r13 */	\
 	std	r9,PACA_EXGEN+EX_R9(r13);	/* save r9, r10 */	\
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 1808876..fc3ead0 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -41,6 +41,7 @@ __start_interrupts:
 	. = 0x200
 _machine_check_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0x200
 	mtspr	SPRN_SPRG_SCRATCH0,r13		/* save r13 */
 	EXCEPTION_PROLOG_PSERIES(PACA_EXMC, machine_check_common)
 
@@ -48,6 +49,7 @@ _machine_check_pSeries:
 	.globl data_access_pSeries
 data_access_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0x300
 	mtspr	SPRN_SPRG_SCRATCH0,r13
 BEGIN_FTR_SECTION
 	mfspr	r13,SPRN_SPRG_PACA
@@ -77,6 +79,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_SLB)
 	.globl data_access_slb_pSeries
 data_access_slb_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0x380
 	mtspr	SPRN_SPRG_SCRATCH0,r13
 	mfspr	r13,SPRN_SPRG_PACA		/* get paca address into r13 */
 	std	r3,PACA_EXSLB+EX_R3(r13)
@@ -115,6 +118,7 @@ data_access_slb_pSeries:
 	.globl instruction_access_slb_pSeries
 instruction_access_slb_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0x480
 	mtspr	SPRN_SPRG_SCRATCH0,r13
 	mfspr	r13,SPRN_SPRG_PACA		/* get paca address into r13 */
 	std	r3,PACA_EXSLB+EX_R3(r13)
@@ -154,6 +158,7 @@ instruction_access_slb_pSeries:
 	.globl	system_call_pSeries
 system_call_pSeries:
 	HMT_MEDIUM
+	DO_KVM	0xc00
 BEGIN_FTR_SECTION
 	cmpdi	r0,0x1ebe
 	beq-	1f
@@ -186,12 +191,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_REAL_LE)
 	 * trickery is thus necessary
 	 */
 	. = 0xf00
+	DO_KVM	0xf00
 	b	performance_monitor_pSeries
 
 	. = 0xf20
+	DO_KVM	0xf20
 	b	altivec_unavailable_pSeries
 
 	. = 0xf40
+	DO_KVM	0xf40
 	b	vsx_unavailable_pSeries
 
 #ifdef CONFIG_CBE_RAS
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index c38afdb..9258074 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -37,6 +37,7 @@
 #include <asm/firmware.h>
 #include <asm/page_64.h>
 #include <asm/irqflags.h>
+#include <asm/kvm_book3s_64_asm.h>
 
 /* The physical memory is layed out such that the secondary processor
  * spin code sits at 0x0000...0x00ff. On server, the vectors follow
@@ -165,6 +166,12 @@ exception_marker:
 #include "exceptions-64s.S"
 #endif
 
+/* KVM trampoline code needs to be close to the interrupt handlers */
+
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+#include "../kvm/book3s_64_rmhandlers.S"
+#endif
+
 _GLOBAL(generic_secondary_thread_init)
 	mr	r24,r3
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 18/27] Add Book3s_64 offsets to asm-offsets.c
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47   ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

We need to access some VCPU fields from assembly code. In order to get
the proper offsets, we have to define them in asm-offsets.c.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kernel/asm-offsets.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 0812b0f..aba3ea6 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -398,6 +398,19 @@ int main(void)
 	DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, arch.last_inst));
 	DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
 	DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
+
+	/* book3s_64 */
+#ifdef CONFIG_PPC64
+	DEFINE(VCPU_FAULT_DSISR, offsetof(struct kvm_vcpu, arch.fault_dsisr));
+	DEFINE(VCPU_HOST_RETIP, offsetof(struct kvm_vcpu, arch.host_retip));
+	DEFINE(VCPU_HOST_R2, offsetof(struct kvm_vcpu, arch.host_r2));
+	DEFINE(VCPU_HOST_MSR, offsetof(struct kvm_vcpu, arch.host_msr));
+	DEFINE(VCPU_SHADOW_MSR, offsetof(struct kvm_vcpu, arch.shadow_msr));
+	DEFINE(VCPU_TRAMPOLINE_LOWMEM, offsetof(struct kvm_vcpu, arch.trampoline_lowmem));
+	DEFINE(VCPU_TRAMPOLINE_ENTER, offsetof(struct kvm_vcpu, arch.trampoline_enter));
+	DEFINE(VCPU_HIGHMEM_HANDLER, offsetof(struct kvm_vcpu, arch.highmem_handler));
+	DEFINE(VCPU_HFLAGS, offsetof(struct kvm_vcpu, arch.hflags));
+#endif
 #endif
 #ifdef CONFIG_44x
 	DEFINE(PGD_T_LOG2, PGD_T_LOG2);
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 18/27] Add Book3s_64 offsets to asm-offsets.c
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

We need to access some VCPU fields from assembly code. In order to get
the proper offsets, we have to define them in asm-offsets.c.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kernel/asm-offsets.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 0812b0f..aba3ea6 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -398,6 +398,19 @@ int main(void)
 	DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, arch.last_inst));
 	DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
 	DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
+
+	/* book3s_64 */
+#ifdef CONFIG_PPC64
+	DEFINE(VCPU_FAULT_DSISR, offsetof(struct kvm_vcpu, arch.fault_dsisr));
+	DEFINE(VCPU_HOST_RETIP, offsetof(struct kvm_vcpu, arch.host_retip));
+	DEFINE(VCPU_HOST_R2, offsetof(struct kvm_vcpu, arch.host_r2));
+	DEFINE(VCPU_HOST_MSR, offsetof(struct kvm_vcpu, arch.host_msr));
+	DEFINE(VCPU_SHADOW_MSR, offsetof(struct kvm_vcpu, arch.shadow_msr));
+	DEFINE(VCPU_TRAMPOLINE_LOWMEM, offsetof(struct kvm_vcpu, arch.trampoline_lowmem));
+	DEFINE(VCPU_TRAMPOLINE_ENTER, offsetof(struct kvm_vcpu, arch.trampoline_enter));
+	DEFINE(VCPU_HIGHMEM_HANDLER, offsetof(struct kvm_vcpu, arch.highmem_handler));
+	DEFINE(VCPU_HFLAGS, offsetof(struct kvm_vcpu, arch.hflags));
+#endif
 #endif
 #ifdef CONFIG_44x
 	DEFINE(PGD_T_LOG2, PGD_T_LOG2);
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 18/27] Add Book3s_64 offsets to asm-offsets.c
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

We need to access some VCPU fields from assembly code. In order to get
the proper offsets, we have to define them in asm-offsets.c.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kernel/asm-offsets.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 0812b0f..aba3ea6 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -398,6 +398,19 @@ int main(void)
 	DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, arch.last_inst));
 	DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
 	DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
+
+	/* book3s_64 */
+#ifdef CONFIG_PPC64
+	DEFINE(VCPU_FAULT_DSISR, offsetof(struct kvm_vcpu, arch.fault_dsisr));
+	DEFINE(VCPU_HOST_RETIP, offsetof(struct kvm_vcpu, arch.host_retip));
+	DEFINE(VCPU_HOST_R2, offsetof(struct kvm_vcpu, arch.host_r2));
+	DEFINE(VCPU_HOST_MSR, offsetof(struct kvm_vcpu, arch.host_msr));
+	DEFINE(VCPU_SHADOW_MSR, offsetof(struct kvm_vcpu, arch.shadow_msr));
+	DEFINE(VCPU_TRAMPOLINE_LOWMEM, offsetof(struct kvm_vcpu, arch.trampoline_lowmem));
+	DEFINE(VCPU_TRAMPOLINE_ENTER, offsetof(struct kvm_vcpu, arch.trampoline_enter));
+	DEFINE(VCPU_HIGHMEM_HANDLER, offsetof(struct kvm_vcpu, arch.highmem_handler));
+	DEFINE(VCPU_HFLAGS, offsetof(struct kvm_vcpu, arch.hflags));
+#endif
 #endif
 #ifdef CONFIG_44x
 	DEFINE(PGD_T_LOG2, PGD_T_LOG2);
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 19/27] Export symbols for KVM module
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47     ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

We want to be able to build KVM as a module. To enable us doing so, we
need some more exports from core Linux parts.

This patch exports all functions and variables that are required for KVM.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>

---

v3 -> v4:

  - don't export switch_slb
  - don't export init_context
  - don't export mm_alloc
---
 arch/powerpc/kernel/ppc_ksyms.c |    3 ++-
 arch/powerpc/kernel/time.c      |    1 +
 arch/powerpc/mm/hash_utils_64.c |    2 ++
 3 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
index c8b27bb..baf778c 100644
--- a/arch/powerpc/kernel/ppc_ksyms.c
+++ b/arch/powerpc/kernel/ppc_ksyms.c
@@ -163,11 +163,12 @@ EXPORT_SYMBOL(screen_info);
 #ifdef CONFIG_PPC32
 EXPORT_SYMBOL(timer_interrupt);
 EXPORT_SYMBOL(irq_desc);
-EXPORT_SYMBOL(tb_ticks_per_jiffy);
 EXPORT_SYMBOL(cacheable_memcpy);
 EXPORT_SYMBOL(cacheable_memzero);
 #endif
 
+EXPORT_SYMBOL(tb_ticks_per_jiffy);
+
 #ifdef CONFIG_PPC32
 EXPORT_SYMBOL(switch_mmu_context);
 #endif
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 92dc844..e05f6af 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -268,6 +268,7 @@ void account_system_vtime(struct task_struct *tsk)
 	per_cpu(cputime_scaled_last_delta, smp_processor_id()) = deltascaled;
 	local_irq_restore(flags);
 }
+EXPORT_SYMBOL_GPL(account_system_vtime);
 
 /*
  * Transfer the user and system times accumulated in the paca
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 1ade7eb..2b2a4aa 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -92,6 +92,7 @@ struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
 struct hash_pte *htab_address;
 unsigned long htab_size_bytes;
 unsigned long htab_hash_mask;
+EXPORT_SYMBOL_GPL(htab_hash_mask);
 int mmu_linear_psize = MMU_PAGE_4K;
 int mmu_virtual_psize = MMU_PAGE_4K;
 int mmu_vmalloc_psize = MMU_PAGE_4K;
@@ -102,6 +103,7 @@ int mmu_io_psize = MMU_PAGE_4K;
 int mmu_kernel_ssize = MMU_SEGSIZE_256M;
 int mmu_highuser_ssize = MMU_SEGSIZE_256M;
 u16 mmu_slb_size = 64;
+EXPORT_SYMBOL_GPL(mmu_slb_size);
 #ifdef CONFIG_HUGETLB_PAGE
 unsigned int HPAGE_SHIFT;
 #endif
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 19/27] Export symbols for KVM module
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

We want to be able to build KVM as a module. To enable us doing so, we
need some more exports from core Linux parts.

This patch exports all functions and variables that are required for KVM.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - don't export switch_slb
  - don't export init_context
  - don't export mm_alloc
---
 arch/powerpc/kernel/ppc_ksyms.c |    3 ++-
 arch/powerpc/kernel/time.c      |    1 +
 arch/powerpc/mm/hash_utils_64.c |    2 ++
 3 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
index c8b27bb..baf778c 100644
--- a/arch/powerpc/kernel/ppc_ksyms.c
+++ b/arch/powerpc/kernel/ppc_ksyms.c
@@ -163,11 +163,12 @@ EXPORT_SYMBOL(screen_info);
 #ifdef CONFIG_PPC32
 EXPORT_SYMBOL(timer_interrupt);
 EXPORT_SYMBOL(irq_desc);
-EXPORT_SYMBOL(tb_ticks_per_jiffy);
 EXPORT_SYMBOL(cacheable_memcpy);
 EXPORT_SYMBOL(cacheable_memzero);
 #endif
 
+EXPORT_SYMBOL(tb_ticks_per_jiffy);
+
 #ifdef CONFIG_PPC32
 EXPORT_SYMBOL(switch_mmu_context);
 #endif
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 92dc844..e05f6af 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -268,6 +268,7 @@ void account_system_vtime(struct task_struct *tsk)
 	per_cpu(cputime_scaled_last_delta, smp_processor_id()) = deltascaled;
 	local_irq_restore(flags);
 }
+EXPORT_SYMBOL_GPL(account_system_vtime);
 
 /*
  * Transfer the user and system times accumulated in the paca
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 1ade7eb..2b2a4aa 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -92,6 +92,7 @@ struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
 struct hash_pte *htab_address;
 unsigned long htab_size_bytes;
 unsigned long htab_hash_mask;
+EXPORT_SYMBOL_GPL(htab_hash_mask);
 int mmu_linear_psize = MMU_PAGE_4K;
 int mmu_virtual_psize = MMU_PAGE_4K;
 int mmu_vmalloc_psize = MMU_PAGE_4K;
@@ -102,6 +103,7 @@ int mmu_io_psize = MMU_PAGE_4K;
 int mmu_kernel_ssize = MMU_SEGSIZE_256M;
 int mmu_highuser_ssize = MMU_SEGSIZE_256M;
 u16 mmu_slb_size = 64;
+EXPORT_SYMBOL_GPL(mmu_slb_size);
 #ifdef CONFIG_HUGETLB_PAGE
 unsigned int HPAGE_SHIFT;
 #endif
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 19/27] Export symbols for KVM module
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

We want to be able to build KVM as a module. To enable us doing so, we
need some more exports from core Linux parts.

This patch exports all functions and variables that are required for KVM.

Signed-off-by: Alexander Graf <agraf@suse.de>

---

v3 -> v4:

  - don't export switch_slb
  - don't export init_context
  - don't export mm_alloc
---
 arch/powerpc/kernel/ppc_ksyms.c |    3 ++-
 arch/powerpc/kernel/time.c      |    1 +
 arch/powerpc/mm/hash_utils_64.c |    2 ++
 3 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
index c8b27bb..baf778c 100644
--- a/arch/powerpc/kernel/ppc_ksyms.c
+++ b/arch/powerpc/kernel/ppc_ksyms.c
@@ -163,11 +163,12 @@ EXPORT_SYMBOL(screen_info);
 #ifdef CONFIG_PPC32
 EXPORT_SYMBOL(timer_interrupt);
 EXPORT_SYMBOL(irq_desc);
-EXPORT_SYMBOL(tb_ticks_per_jiffy);
 EXPORT_SYMBOL(cacheable_memcpy);
 EXPORT_SYMBOL(cacheable_memzero);
 #endif
 
+EXPORT_SYMBOL(tb_ticks_per_jiffy);
+
 #ifdef CONFIG_PPC32
 EXPORT_SYMBOL(switch_mmu_context);
 #endif
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 92dc844..e05f6af 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -268,6 +268,7 @@ void account_system_vtime(struct task_struct *tsk)
 	per_cpu(cputime_scaled_last_delta, smp_processor_id()) = deltascaled;
 	local_irq_restore(flags);
 }
+EXPORT_SYMBOL_GPL(account_system_vtime);
 
 /*
  * Transfer the user and system times accumulated in the paca
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 1ade7eb..2b2a4aa 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -92,6 +92,7 @@ struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
 struct hash_pte *htab_address;
 unsigned long htab_size_bytes;
 unsigned long htab_hash_mask;
+EXPORT_SYMBOL_GPL(htab_hash_mask);
 int mmu_linear_psize = MMU_PAGE_4K;
 int mmu_virtual_psize = MMU_PAGE_4K;
 int mmu_vmalloc_psize = MMU_PAGE_4K;
@@ -102,6 +103,7 @@ int mmu_io_psize = MMU_PAGE_4K;
 int mmu_kernel_ssize = MMU_SEGSIZE_256M;
 int mmu_highuser_ssize = MMU_SEGSIZE_256M;
 u16 mmu_slb_size = 64;
+EXPORT_SYMBOL_GPL(mmu_slb_size);
 #ifdef CONFIG_HUGETLB_PAGE
 unsigned int HPAGE_SHIFT;
 #endif
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 20/27] Split init_new_context and destroy_context
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47     ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

For KVM we need to allocate a new context id, but don't really care about
all the mm context around it.

So let's split the alloc and destroy functions for the context id, so we can
grab one without allocating an mm context.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
---
 arch/powerpc/include/asm/mmu_context.h |    5 +++++
 arch/powerpc/mm/mmu_context_hash64.c   |   24 +++++++++++++++++++++---
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index b34e94d..66b35d0 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -23,6 +23,11 @@ extern void switch_slb(struct task_struct *tsk, struct mm_struct *mm);
 extern void set_context(unsigned long id, pgd_t *pgd);
 
 #ifdef CONFIG_PPC_BOOK3S_64
+extern int __init_new_context(void);
+extern void __destroy_context(int context_id);
+#endif
+
+#ifdef CONFIG_PPC_BOOK3S_64
 static inline void mmu_context_init(void) { }
 #else
 extern void mmu_context_init(void);
diff --git a/arch/powerpc/mm/mmu_context_hash64.c b/arch/powerpc/mm/mmu_context_hash64.c
index dbeb86a..b9e4cc2 100644
--- a/arch/powerpc/mm/mmu_context_hash64.c
+++ b/arch/powerpc/mm/mmu_context_hash64.c
@@ -18,6 +18,7 @@
 #include <linux/mm.h>
 #include <linux/spinlock.h>
 #include <linux/idr.h>
+#include <linux/module.h>
 
 #include <asm/mmu_context.h>
 
@@ -32,7 +33,7 @@ static DEFINE_IDR(mmu_context_idr);
 #define NO_CONTEXT	0
 #define MAX_CONTEXT	((1UL << 19) - 1)
 
-int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+int __init_new_context(void)
 {
 	int index;
 	int err;
@@ -57,6 +58,18 @@ again:
 		return -ENOMEM;
 	}
 
+	return index;
+}
+EXPORT_SYMBOL_GPL(__init_new_context);
+
+int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+{
+	int index;
+
+	index = __init_new_context();
+	if (index < 0)
+		return index;
+
 	/* The old code would re-promote on fork, we don't do that
 	 * when using slices as it could cause problem promoting slices
 	 * that have been forced down to 4K
@@ -68,11 +81,16 @@ again:
 	return 0;
 }
 
-void destroy_context(struct mm_struct *mm)
+void __destroy_context(int context_id)
 {
 	spin_lock(&mmu_context_lock);
-	idr_remove(&mmu_context_idr, mm->context.id);
+	idr_remove(&mmu_context_idr, context_id);
 	spin_unlock(&mmu_context_lock);
+}
+EXPORT_SYMBOL_GPL(__destroy_context);
 
+void destroy_context(struct mm_struct *mm)
+{
+	__destroy_context(mm->context.id);
 	mm->context.id = NO_CONTEXT;
 }
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 20/27] Split init_new_context and destroy_context
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

For KVM we need to allocate a new context id, but don't really care about
all the mm context around it.

So let's split the alloc and destroy functions for the context id, so we can
grab one without allocating an mm context.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/mmu_context.h |    5 +++++
 arch/powerpc/mm/mmu_context_hash64.c   |   24 +++++++++++++++++++++---
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index b34e94d..66b35d0 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -23,6 +23,11 @@ extern void switch_slb(struct task_struct *tsk, struct mm_struct *mm);
 extern void set_context(unsigned long id, pgd_t *pgd);
 
 #ifdef CONFIG_PPC_BOOK3S_64
+extern int __init_new_context(void);
+extern void __destroy_context(int context_id);
+#endif
+
+#ifdef CONFIG_PPC_BOOK3S_64
 static inline void mmu_context_init(void) { }
 #else
 extern void mmu_context_init(void);
diff --git a/arch/powerpc/mm/mmu_context_hash64.c b/arch/powerpc/mm/mmu_context_hash64.c
index dbeb86a..b9e4cc2 100644
--- a/arch/powerpc/mm/mmu_context_hash64.c
+++ b/arch/powerpc/mm/mmu_context_hash64.c
@@ -18,6 +18,7 @@
 #include <linux/mm.h>
 #include <linux/spinlock.h>
 #include <linux/idr.h>
+#include <linux/module.h>
 
 #include <asm/mmu_context.h>
 
@@ -32,7 +33,7 @@ static DEFINE_IDR(mmu_context_idr);
 #define NO_CONTEXT	0
 #define MAX_CONTEXT	((1UL << 19) - 1)
 
-int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+int __init_new_context(void)
 {
 	int index;
 	int err;
@@ -57,6 +58,18 @@ again:
 		return -ENOMEM;
 	}
 
+	return index;
+}
+EXPORT_SYMBOL_GPL(__init_new_context);
+
+int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+{
+	int index;
+
+	index = __init_new_context();
+	if (index < 0)
+		return index;
+
 	/* The old code would re-promote on fork, we don't do that
 	 * when using slices as it could cause problem promoting slices
 	 * that have been forced down to 4K
@@ -68,11 +81,16 @@ again:
 	return 0;
 }
 
-void destroy_context(struct mm_struct *mm)
+void __destroy_context(int context_id)
 {
 	spin_lock(&mmu_context_lock);
-	idr_remove(&mmu_context_idr, mm->context.id);
+	idr_remove(&mmu_context_idr, context_id);
 	spin_unlock(&mmu_context_lock);
+}
+EXPORT_SYMBOL_GPL(__destroy_context);
 
+void destroy_context(struct mm_struct *mm)
+{
+	__destroy_context(mm->context.id);
 	mm->context.id = NO_CONTEXT;
 }
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 20/27] Split init_new_context and destroy_context
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

For KVM we need to allocate a new context id, but don't really care about
all the mm context around it.

So let's split the alloc and destroy functions for the context id, so we can
grab one without allocating an mm context.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/mmu_context.h |    5 +++++
 arch/powerpc/mm/mmu_context_hash64.c   |   24 +++++++++++++++++++++---
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index b34e94d..66b35d0 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -23,6 +23,11 @@ extern void switch_slb(struct task_struct *tsk, struct mm_struct *mm);
 extern void set_context(unsigned long id, pgd_t *pgd);
 
 #ifdef CONFIG_PPC_BOOK3S_64
+extern int __init_new_context(void);
+extern void __destroy_context(int context_id);
+#endif
+
+#ifdef CONFIG_PPC_BOOK3S_64
 static inline void mmu_context_init(void) { }
 #else
 extern void mmu_context_init(void);
diff --git a/arch/powerpc/mm/mmu_context_hash64.c b/arch/powerpc/mm/mmu_context_hash64.c
index dbeb86a..b9e4cc2 100644
--- a/arch/powerpc/mm/mmu_context_hash64.c
+++ b/arch/powerpc/mm/mmu_context_hash64.c
@@ -18,6 +18,7 @@
 #include <linux/mm.h>
 #include <linux/spinlock.h>
 #include <linux/idr.h>
+#include <linux/module.h>
 
 #include <asm/mmu_context.h>
 
@@ -32,7 +33,7 @@ static DEFINE_IDR(mmu_context_idr);
 #define NO_CONTEXT	0
 #define MAX_CONTEXT	((1UL << 19) - 1)
 
-int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+int __init_new_context(void)
 {
 	int index;
 	int err;
@@ -57,6 +58,18 @@ again:
 		return -ENOMEM;
 	}
 
+	return index;
+}
+EXPORT_SYMBOL_GPL(__init_new_context);
+
+int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
+{
+	int index;
+
+	index = __init_new_context();
+	if (index < 0)
+		return index;
+
 	/* The old code would re-promote on fork, we don't do that
 	 * when using slices as it could cause problem promoting slices
 	 * that have been forced down to 4K
@@ -68,11 +81,16 @@ again:
 	return 0;
 }
 
-void destroy_context(struct mm_struct *mm)
+void __destroy_context(int context_id)
 {
 	spin_lock(&mmu_context_lock);
-	idr_remove(&mmu_context_idr, mm->context.id);
+	idr_remove(&mmu_context_idr, context_id);
 	spin_unlock(&mmu_context_lock);
+}
+EXPORT_SYMBOL_GPL(__destroy_context);
 
+void destroy_context(struct mm_struct *mm)
+{
+	__destroy_context(mm->context.id);
 	mm->context.id = NO_CONTEXT;
 }
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 21/27] Export KVM symbols for module
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47     ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

To be able to keep KVM as module, we need to export the SLB trampoline
addresses to the module, so it knows where to jump to.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
---
 arch/powerpc/kvm/book3s_64_exports.c |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_exports.c

diff --git a/arch/powerpc/kvm/book3s_64_exports.c b/arch/powerpc/kvm/book3s_64_exports.c
new file mode 100644
index 0000000..5b2db38
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_exports.c
@@ -0,0 +1,24 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
+ */
+
+#include <linux/module.h>
+#include <asm/kvm_book3s.h>
+
+EXPORT_SYMBOL_GPL(kvmppc_trampoline_enter);
+EXPORT_SYMBOL_GPL(kvmppc_trampoline_lowmem);
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 21/27] Export KVM symbols for module
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

To be able to keep KVM as module, we need to export the SLB trampoline
addresses to the module, so it knows where to jump to.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s_64_exports.c |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_exports.c

diff --git a/arch/powerpc/kvm/book3s_64_exports.c b/arch/powerpc/kvm/book3s_64_exports.c
new file mode 100644
index 0000000..5b2db38
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_exports.c
@@ -0,0 +1,24 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <linux/module.h>
+#include <asm/kvm_book3s.h>
+
+EXPORT_SYMBOL_GPL(kvmppc_trampoline_enter);
+EXPORT_SYMBOL_GPL(kvmppc_trampoline_lowmem);
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 21/27] Export KVM symbols for module
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

To be able to keep KVM as module, we need to export the SLB trampoline
addresses to the module, so it knows where to jump to.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/book3s_64_exports.c |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_64_exports.c

diff --git a/arch/powerpc/kvm/book3s_64_exports.c b/arch/powerpc/kvm/book3s_64_exports.c
new file mode 100644
index 0000000..5b2db38
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_64_exports.c
@@ -0,0 +1,24 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2009
+ *
+ * Authors: Alexander Graf <agraf@suse.de>
+ */
+
+#include <linux/module.h>
+#include <asm/kvm_book3s.h>
+
+EXPORT_SYMBOL_GPL(kvmppc_trampoline_enter);
+EXPORT_SYMBOL_GPL(kvmppc_trampoline_lowmem);
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 22/27] Add fields to PACA
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47     ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

For KVM we need to store some information in the PACA, so we
need to extend it.

This patch adds KVM SLB shadow related entries to the PACA and
a field that indicates if we're inside a guest.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
---
 arch/powerpc/include/asm/paca.h |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 7d8514c..5e9b4ef 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -129,6 +129,15 @@ struct paca_struct {
 	u64 system_time;		/* accumulated system TB ticks */
 	u64 startpurr;			/* PURR/TB value snapshot */
 	u64 startspurr;			/* SPURR value snapshot */
+
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+	struct  {
+		u64     esid;
+		u64     vsid;
+	} kvm_slb[64];			/* guest SLB */
+	u8 kvm_slb_max;			/* highest used guest slb entry */
+	u8 kvm_in_guest;		/* are we inside the guest? */
+#endif
 };
 
 extern struct paca_struct paca[];
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 22/27] Add fields to PACA
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

For KVM we need to store some information in the PACA, so we
need to extend it.

This patch adds KVM SLB shadow related entries to the PACA and
a field that indicates if we're inside a guest.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/paca.h |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 7d8514c..5e9b4ef 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -129,6 +129,15 @@ struct paca_struct {
 	u64 system_time;		/* accumulated system TB ticks */
 	u64 startpurr;			/* PURR/TB value snapshot */
 	u64 startspurr;			/* SPURR value snapshot */
+
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+	struct  {
+		u64     esid;
+		u64     vsid;
+	} kvm_slb[64];			/* guest SLB */
+	u8 kvm_slb_max;			/* highest used guest slb entry */
+	u8 kvm_in_guest;		/* are we inside the guest? */
+#endif
 };
 
 extern struct paca_struct paca[];
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 22/27] Add fields to PACA
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

For KVM we need to store some information in the PACA, so we
need to extend it.

This patch adds KVM SLB shadow related entries to the PACA and
a field that indicates if we're inside a guest.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/paca.h |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 7d8514c..5e9b4ef 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -129,6 +129,15 @@ struct paca_struct {
 	u64 system_time;		/* accumulated system TB ticks */
 	u64 startpurr;			/* PURR/TB value snapshot */
 	u64 startspurr;			/* SPURR value snapshot */
+
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+	struct  {
+		u64     esid;
+		u64     vsid;
+	} kvm_slb[64];			/* guest SLB */
+	u8 kvm_slb_max;			/* highest used guest slb entry */
+	u8 kvm_in_guest;		/* are we inside the guest? */
+#endif
 };
 
 extern struct paca_struct paca[];
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 23/27] Export new PACA constants in asm-offsets
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47   ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

In order to access fields in the PACA from assembly code, we need
to generate offsets using asm-offsets.c.

So let's add the new PACA related bits, we just introduced!

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kernel/asm-offsets.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index aba3ea6..e2e2082 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -190,6 +190,11 @@ int main(void)
 	DEFINE(PACA_SYSTEM_TIME, offsetof(struct paca_struct, system_time));
 	DEFINE(PACA_DATA_OFFSET, offsetof(struct paca_struct, data_offset));
 	DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save));
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+	DEFINE(PACA_KVM_IN_GUEST, offsetof(struct paca_struct, kvm_in_guest));
+	DEFINE(PACA_KVM_SLB, offsetof(struct paca_struct, kvm_slb));
+	DEFINE(PACA_KVM_SLB_MAX, offsetof(struct paca_struct, kvm_slb_max));
+#endif
 #endif /* CONFIG_PPC64 */
 
 	/* RTAS */
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 23/27] Export new PACA constants in asm-offsets
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

In order to access fields in the PACA from assembly code, we need
to generate offsets using asm-offsets.c.

So let's add the new PACA related bits, we just introduced!

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kernel/asm-offsets.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index aba3ea6..e2e2082 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -190,6 +190,11 @@ int main(void)
 	DEFINE(PACA_SYSTEM_TIME, offsetof(struct paca_struct, system_time));
 	DEFINE(PACA_DATA_OFFSET, offsetof(struct paca_struct, data_offset));
 	DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save));
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+	DEFINE(PACA_KVM_IN_GUEST, offsetof(struct paca_struct, kvm_in_guest));
+	DEFINE(PACA_KVM_SLB, offsetof(struct paca_struct, kvm_slb));
+	DEFINE(PACA_KVM_SLB_MAX, offsetof(struct paca_struct, kvm_slb_max));
+#endif
 #endif /* CONFIG_PPC64 */
 
 	/* RTAS */
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 23/27] Export new PACA constants in asm-offsets
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

In order to access fields in the PACA from assembly code, we need
to generate offsets using asm-offsets.c.

So let's add the new PACA related bits, we just introduced!

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kernel/asm-offsets.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index aba3ea6..e2e2082 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -190,6 +190,11 @@ int main(void)
 	DEFINE(PACA_SYSTEM_TIME, offsetof(struct paca_struct, system_time));
 	DEFINE(PACA_DATA_OFFSET, offsetof(struct paca_struct, data_offset));
 	DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save));
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+	DEFINE(PACA_KVM_IN_GUEST, offsetof(struct paca_struct, kvm_in_guest));
+	DEFINE(PACA_KVM_SLB, offsetof(struct paca_struct, kvm_slb));
+	DEFINE(PACA_KVM_SLB_MAX, offsetof(struct paca_struct, kvm_slb_max));
+#endif
 #endif /* CONFIG_PPC64 */
 
 	/* RTAS */
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 24/27] Include Book3s_64 target in buildsystem
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47   ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

Now we have everything in place to be able to build KVM, so let's add it
as config option and in the Makefile.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/Kconfig  |   17 +++++++++++++++++
 arch/powerpc/kvm/Makefile |   27 +++++++++++++++++++++++----
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index c299268..07703f7 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -21,6 +21,23 @@ config KVM
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES
 
+config KVM_BOOK3S_64_HANDLER
+	bool
+
+config KVM_BOOK3S_64
+	tristate "KVM support for PowerPC book3s_64 processors"
+	depends on EXPERIMENTAL && PPC64
+	select KVM
+	select KVM_BOOK3S_64_HANDLER
+	---help---
+	  Support running unmodified book3s_64 and book3s_32 guest kernels
+	  in virtual machines on book3s_64 host processors.
+
+	  This module provides access to the hardware capabilities through
+	  a character device node named /dev/kvm.
+
+	  If unsure, say N.
+
 config KVM_440
 	bool "KVM support for PowerPC 440 processors"
 	depends on EXPERIMENTAL && 44x
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 37655fe..56484d6 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -12,26 +12,45 @@ CFLAGS_44x_tlb.o  := -I.
 CFLAGS_e500_tlb.o := -I.
 CFLAGS_emulate.o  := -I.
 
-kvm-objs := $(common-objs-y) powerpc.o emulate.o
+common-objs-y += powerpc.o emulate.o
 obj-$(CONFIG_KVM_EXIT_TIMING) += timing.o
-obj-$(CONFIG_KVM) += kvm.o
+obj-$(CONFIG_KVM_BOOK3S_64_HANDLER) += book3s_64_exports.o
 
 AFLAGS_booke_interrupts.o := -I$(obj)
 
 kvm-440-objs := \
+	$(common-objs-y) \
 	booke.o \
 	booke_emulate.o \
 	booke_interrupts.o \
 	44x.o \
 	44x_tlb.o \
 	44x_emulate.o
-obj-$(CONFIG_KVM_440) += kvm-440.o
+kvm-objs-$(CONFIG_KVM_440) := $(kvm-440-objs)
 
 kvm-e500-objs := \
+	$(common-objs-y) \
 	booke.o \
 	booke_emulate.o \
 	booke_interrupts.o \
 	e500.o \
 	e500_tlb.o \
 	e500_emulate.o
-obj-$(CONFIG_KVM_E500) += kvm-e500.o
+kvm-objs-$(CONFIG_KVM_E500) := $(kvm-e500-objs)
+
+kvm-book3s_64-objs := \
+	$(common-objs-y) \
+	book3s.o \
+	book3s_64_emulate.o \
+	book3s_64_interrupts.o \
+	book3s_64_mmu_host.o \
+	book3s_64_mmu.o \
+	book3s_32_mmu.o
+kvm-objs-$(CONFIG_KVM_BOOK3S_64) := $(kvm-book3s_64-objs)
+
+kvm-objs := $(kvm-objs-m) $(kvm-objs-y)
+
+obj-$(CONFIG_KVM_440) += kvm.o
+obj-$(CONFIG_KVM_E500) += kvm.o
+obj-$(CONFIG_KVM_BOOK3S_64) += kvm.o
+
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 24/27] Include Book3s_64 target in buildsystem
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

Now we have everything in place to be able to build KVM, so let's add it
as config option and in the Makefile.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/Kconfig  |   17 +++++++++++++++++
 arch/powerpc/kvm/Makefile |   27 +++++++++++++++++++++++----
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index c299268..07703f7 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -21,6 +21,23 @@ config KVM
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES
 
+config KVM_BOOK3S_64_HANDLER
+	bool
+
+config KVM_BOOK3S_64
+	tristate "KVM support for PowerPC book3s_64 processors"
+	depends on EXPERIMENTAL && PPC64
+	select KVM
+	select KVM_BOOK3S_64_HANDLER
+	---help---
+	  Support running unmodified book3s_64 and book3s_32 guest kernels
+	  in virtual machines on book3s_64 host processors.
+
+	  This module provides access to the hardware capabilities through
+	  a character device node named /dev/kvm.
+
+	  If unsure, say N.
+
 config KVM_440
 	bool "KVM support for PowerPC 440 processors"
 	depends on EXPERIMENTAL && 44x
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 37655fe..56484d6 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -12,26 +12,45 @@ CFLAGS_44x_tlb.o  := -I.
 CFLAGS_e500_tlb.o := -I.
 CFLAGS_emulate.o  := -I.
 
-kvm-objs := $(common-objs-y) powerpc.o emulate.o
+common-objs-y += powerpc.o emulate.o
 obj-$(CONFIG_KVM_EXIT_TIMING) += timing.o
-obj-$(CONFIG_KVM) += kvm.o
+obj-$(CONFIG_KVM_BOOK3S_64_HANDLER) += book3s_64_exports.o
 
 AFLAGS_booke_interrupts.o := -I$(obj)
 
 kvm-440-objs := \
+	$(common-objs-y) \
 	booke.o \
 	booke_emulate.o \
 	booke_interrupts.o \
 	44x.o \
 	44x_tlb.o \
 	44x_emulate.o
-obj-$(CONFIG_KVM_440) += kvm-440.o
+kvm-objs-$(CONFIG_KVM_440) := $(kvm-440-objs)
 
 kvm-e500-objs := \
+	$(common-objs-y) \
 	booke.o \
 	booke_emulate.o \
 	booke_interrupts.o \
 	e500.o \
 	e500_tlb.o \
 	e500_emulate.o
-obj-$(CONFIG_KVM_E500) += kvm-e500.o
+kvm-objs-$(CONFIG_KVM_E500) := $(kvm-e500-objs)
+
+kvm-book3s_64-objs := \
+	$(common-objs-y) \
+	book3s.o \
+	book3s_64_emulate.o \
+	book3s_64_interrupts.o \
+	book3s_64_mmu_host.o \
+	book3s_64_mmu.o \
+	book3s_32_mmu.o
+kvm-objs-$(CONFIG_KVM_BOOK3S_64) := $(kvm-book3s_64-objs)
+
+kvm-objs := $(kvm-objs-m) $(kvm-objs-y)
+
+obj-$(CONFIG_KVM_440) += kvm.o
+obj-$(CONFIG_KVM_E500) += kvm.o
+obj-$(CONFIG_KVM_BOOK3S_64) += kvm.o
+
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 24/27] Include Book3s_64 target in buildsystem
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

Now we have everything in place to be able to build KVM, so let's add it
as config option and in the Makefile.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/Kconfig  |   17 +++++++++++++++++
 arch/powerpc/kvm/Makefile |   27 +++++++++++++++++++++++----
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index c299268..07703f7 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -21,6 +21,23 @@ config KVM
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES
 
+config KVM_BOOK3S_64_HANDLER
+	bool
+
+config KVM_BOOK3S_64
+	tristate "KVM support for PowerPC book3s_64 processors"
+	depends on EXPERIMENTAL && PPC64
+	select KVM
+	select KVM_BOOK3S_64_HANDLER
+	---help---
+	  Support running unmodified book3s_64 and book3s_32 guest kernels
+	  in virtual machines on book3s_64 host processors.
+
+	  This module provides access to the hardware capabilities through
+	  a character device node named /dev/kvm.
+
+	  If unsure, say N.
+
 config KVM_440
 	bool "KVM support for PowerPC 440 processors"
 	depends on EXPERIMENTAL && 44x
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 37655fe..56484d6 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -12,26 +12,45 @@ CFLAGS_44x_tlb.o  := -I.
 CFLAGS_e500_tlb.o := -I.
 CFLAGS_emulate.o  := -I.
 
-kvm-objs := $(common-objs-y) powerpc.o emulate.o
+common-objs-y += powerpc.o emulate.o
 obj-$(CONFIG_KVM_EXIT_TIMING) += timing.o
-obj-$(CONFIG_KVM) += kvm.o
+obj-$(CONFIG_KVM_BOOK3S_64_HANDLER) += book3s_64_exports.o
 
 AFLAGS_booke_interrupts.o := -I$(obj)
 
 kvm-440-objs := \
+	$(common-objs-y) \
 	booke.o \
 	booke_emulate.o \
 	booke_interrupts.o \
 	44x.o \
 	44x_tlb.o \
 	44x_emulate.o
-obj-$(CONFIG_KVM_440) += kvm-440.o
+kvm-objs-$(CONFIG_KVM_440) := $(kvm-440-objs)
 
 kvm-e500-objs := \
+	$(common-objs-y) \
 	booke.o \
 	booke_emulate.o \
 	booke_interrupts.o \
 	e500.o \
 	e500_tlb.o \
 	e500_emulate.o
-obj-$(CONFIG_KVM_E500) += kvm-e500.o
+kvm-objs-$(CONFIG_KVM_E500) := $(kvm-e500-objs)
+
+kvm-book3s_64-objs := \
+	$(common-objs-y) \
+	book3s.o \
+	book3s_64_emulate.o \
+	book3s_64_interrupts.o \
+	book3s_64_mmu_host.o \
+	book3s_64_mmu.o \
+	book3s_32_mmu.o
+kvm-objs-$(CONFIG_KVM_BOOK3S_64) := $(kvm-book3s_64-objs)
+
+kvm-objs := $(kvm-objs-m) $(kvm-objs-y)
+
+obj-$(CONFIG_KVM_440) += kvm.o
+obj-$(CONFIG_KVM_E500) += kvm.o
+obj-$(CONFIG_KVM_BOOK3S_64) += kvm.o
+
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 25/27] Fix trace.h
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47   ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

It looks like the variable "pc" is defined. At least the current code always
failed on me stating that "pc" is already defined somewhere else.

Let's use _pc instead, because that doesn't collide.

Is this the right approach? Does it break on 440 too? If not, why not?

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/trace.h |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
index 67f219d..a8e8400 100644
--- a/arch/powerpc/kvm/trace.h
+++ b/arch/powerpc/kvm/trace.h
@@ -12,8 +12,8 @@
  * Tracepoint for guest mode entry.
  */
 TRACE_EVENT(kvm_ppc_instr,
-	TP_PROTO(unsigned int inst, unsigned long pc, unsigned int emulate),
-	TP_ARGS(inst, pc, emulate),
+	TP_PROTO(unsigned int inst, unsigned long _pc, unsigned int emulate),
+	TP_ARGS(inst, _pc, emulate),
 
 	TP_STRUCT__entry(
 		__field(	unsigned int,	inst		)
@@ -23,7 +23,7 @@ TRACE_EVENT(kvm_ppc_instr,
 
 	TP_fast_assign(
 		__entry->inst		= inst;
-		__entry->pc		= pc;
+		__entry->pc		= _pc;
 		__entry->emulate	= emulate;
 	),
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 25/27] Fix trace.h
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

It looks like the variable "pc" is defined. At least the current code always
failed on me stating that "pc" is already defined somewhere else.

Let's use _pc instead, because that doesn't collide.

Is this the right approach? Does it break on 440 too? If not, why not?

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/trace.h |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
index 67f219d..a8e8400 100644
--- a/arch/powerpc/kvm/trace.h
+++ b/arch/powerpc/kvm/trace.h
@@ -12,8 +12,8 @@
  * Tracepoint for guest mode entry.
  */
 TRACE_EVENT(kvm_ppc_instr,
-	TP_PROTO(unsigned int inst, unsigned long pc, unsigned int emulate),
-	TP_ARGS(inst, pc, emulate),
+	TP_PROTO(unsigned int inst, unsigned long _pc, unsigned int emulate),
+	TP_ARGS(inst, _pc, emulate),
 
 	TP_STRUCT__entry(
 		__field(	unsigned int,	inst		)
@@ -23,7 +23,7 @@ TRACE_EVENT(kvm_ppc_instr,
 
 	TP_fast_assign(
 		__entry->inst		= inst;
-		__entry->pc		= pc;
+		__entry->pc		= _pc;
 		__entry->emulate	= emulate;
 	),
 
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 25/27] Fix trace.h
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

It looks like the variable "pc" is defined. At least the current code always
failed on me stating that "pc" is already defined somewhere else.

Let's use _pc instead, because that doesn't collide.

Is this the right approach? Does it break on 440 too? If not, why not?

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/kvm/trace.h |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
index 67f219d..a8e8400 100644
--- a/arch/powerpc/kvm/trace.h
+++ b/arch/powerpc/kvm/trace.h
@@ -12,8 +12,8 @@
  * Tracepoint for guest mode entry.
  */
 TRACE_EVENT(kvm_ppc_instr,
-	TP_PROTO(unsigned int inst, unsigned long pc, unsigned int emulate),
-	TP_ARGS(inst, pc, emulate),
+	TP_PROTO(unsigned int inst, unsigned long _pc, unsigned int emulate),
+	TP_ARGS(inst, _pc, emulate),
 
 	TP_STRUCT__entry(
 		__field(	unsigned int,	inst		)
@@ -23,7 +23,7 @@ TRACE_EVENT(kvm_ppc_instr,
 
 	TP_fast_assign(
 		__entry->inst		= inst;
-		__entry->pc		= pc;
+		__entry->pc		= _pc;
 		__entry->emulate	= emulate;
 	),
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 26/27] Use Little Endian for Dirty Bitmap
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47   ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

We currently use host endian long types to store information
in the dirty bitmap.

This works reasonably well on Little Endian targets, because the
u32 after the first contains the next 32 bits. On Big Endian this
breaks completely though, forcing us to be inventive here.

So Ben suggested to always use Little Endian, which looks reasonable.

We only have dirty bitmap implemented in Little Endian targets so far
and since PowerPC would be the first Big Endian platform, we can just
as well switch to Little Endian always with little effort without
breaking existing targets.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 virt/kvm/kvm_main.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bd44fb4..972f9fb 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -49,6 +49,7 @@
 #include <asm/io.h>
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
+#include <asm-generic/bitops/le.h>
 
 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
 #include "coalesced_mmio.h"
@@ -1071,8 +1072,8 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
 
 		/* avoid RMW */
-		if (!test_bit(rel_gfn, memslot->dirty_bitmap))
-			set_bit(rel_gfn, memslot->dirty_bitmap);
+		if (!generic_test_le_bit(rel_gfn, memslot->dirty_bitmap))
+			generic___set_le_bit(rel_gfn, memslot->dirty_bitmap);
 	}
 }
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 26/27] Use Little Endian for Dirty Bitmap
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

We currently use host endian long types to store information
in the dirty bitmap.

This works reasonably well on Little Endian targets, because the
u32 after the first contains the next 32 bits. On Big Endian this
breaks completely though, forcing us to be inventive here.

So Ben suggested to always use Little Endian, which looks reasonable.

We only have dirty bitmap implemented in Little Endian targets so far
and since PowerPC would be the first Big Endian platform, we can just
as well switch to Little Endian always with little effort without
breaking existing targets.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 virt/kvm/kvm_main.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bd44fb4..972f9fb 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -49,6 +49,7 @@
 #include <asm/io.h>
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
+#include <asm-generic/bitops/le.h>
 
 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
 #include "coalesced_mmio.h"
@@ -1071,8 +1072,8 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
 
 		/* avoid RMW */
-		if (!test_bit(rel_gfn, memslot->dirty_bitmap))
-			set_bit(rel_gfn, memslot->dirty_bitmap);
+		if (!generic_test_le_bit(rel_gfn, memslot->dirty_bitmap))
+			generic___set_le_bit(rel_gfn, memslot->dirty_bitmap);
 	}
 }
 
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 26/27] Use Little Endian for Dirty Bitmap
@ 2009-10-30 15:47   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips, Marcelo Tosatti,
	Olof Johansson, linuxppc-dev

We currently use host endian long types to store information
in the dirty bitmap.

This works reasonably well on Little Endian targets, because the
u32 after the first contains the next 32 bits. On Big Endian this
breaks completely though, forcing us to be inventive here.

So Ben suggested to always use Little Endian, which looks reasonable.

We only have dirty bitmap implemented in Little Endian targets so far
and since PowerPC would be the first Big Endian platform, we can just
as well switch to Little Endian always with little effort without
breaking existing targets.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 virt/kvm/kvm_main.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bd44fb4..972f9fb 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -49,6 +49,7 @@
 #include <asm/io.h>
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
+#include <asm-generic/bitops/le.h>
 
 #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
 #include "coalesced_mmio.h"
@@ -1071,8 +1072,8 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
 
 		/* avoid RMW */
-		if (!test_bit(rel_gfn, memslot->dirty_bitmap))
-			set_bit(rel_gfn, memslot->dirty_bitmap);
+		if (!generic_test_le_bit(rel_gfn, memslot->dirty_bitmap))
+			generic___set_le_bit(rel_gfn, memslot->dirty_bitmap);
 	}
 }
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 27/27] Use hrtimers for the decrementer
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-10-30 15:47     ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

Following S390's good example we should use hrtimers for the decrementer too!
This patch converts the timer from the old mechanism to hrtimers.

Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
---
 arch/powerpc/include/asm/kvm_host.h |    6 ++++--
 arch/powerpc/kvm/emulate.c          |   18 +++++++++++-------
 arch/powerpc/kvm/powerpc.c          |   20 ++++++++++++++++++--
 3 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 2cff5fe..1201f62 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -21,7 +21,8 @@
 #define __POWERPC_KVM_HOST_H__
 
 #include <linux/mutex.h>
-#include <linux/timer.h>
+#include <linux/hrtimer.h>
+#include <linux/interrupt.h>
 #include <linux/types.h>
 #include <linux/kvm_types.h>
 #include <asm/kvm_asm.h>
@@ -250,7 +251,8 @@ struct kvm_vcpu_arch {
 
 	u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */
 
-	struct timer_list dec_timer;
+	struct hrtimer dec_timer;
+	struct tasklet_struct tasklet;
 	u64 dec_jiffies;
 	unsigned long pending_exceptions;
 
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 1ec5e07..4a9ac66 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -18,7 +18,7 @@
  */
 
 #include <linux/jiffies.h>
-#include <linux/timer.h>
+#include <linux/hrtimer.h>
 #include <linux/types.h>
 #include <linux/string.h>
 #include <linux/kvm_host.h>
@@ -79,12 +79,13 @@ static int kvmppc_dec_enabled(struct kvm_vcpu *vcpu)
 
 void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 {
-	unsigned long nr_jiffies;
+	unsigned long dec_nsec;
 
+	pr_debug("mtDEC: %x\n", vcpu->arch.dec);
 #ifdef CONFIG_PPC64
 	/* POWER4+ triggers a dec interrupt if the value is < 0 */
 	if (vcpu->arch.dec & 0x80000000) {
-		del_timer(&vcpu->arch.dec_timer);
+		hrtimer_try_to_cancel(&vcpu->arch.dec_timer);
 		kvmppc_core_queue_dec(vcpu);
 		return;
 	}
@@ -94,12 +95,15 @@ void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 		 * that's how we convert the guest DEC value to the number of
 		 * host ticks. */
 
+		hrtimer_try_to_cancel(&vcpu->arch.dec_timer);
+		dec_nsec = vcpu->arch.dec;
+		dec_nsec *= 1000;
+		dec_nsec /= tb_ticks_per_usec;
+		hrtimer_start(&vcpu->arch.dec_timer, ktime_set(0, dec_nsec),
+			      HRTIMER_MODE_REL);
 		vcpu->arch.dec_jiffies = get_tb();
-		nr_jiffies = vcpu->arch.dec / tb_ticks_per_jiffy;
-		mod_timer(&vcpu->arch.dec_timer,
-		          get_jiffies_64() + nr_jiffies);
 	} else {
-		del_timer(&vcpu->arch.dec_timer);
+		hrtimer_try_to_cancel(&vcpu->arch.dec_timer);
 	}
 }
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 4ae3490..4c582ed 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -23,6 +23,7 @@
 #include <linux/kvm_host.h>
 #include <linux/module.h>
 #include <linux/vmalloc.h>
+#include <linux/hrtimer.h>
 #include <linux/fs.h>
 #include <asm/cputable.h>
 #include <asm/uaccess.h>
@@ -209,10 +210,25 @@ static void kvmppc_decrementer_func(unsigned long data)
 	}
 }
 
+/*
+ * low level hrtimer wake routine. Because this runs in hardirq context
+ * we schedule a tasklet to do the real work.
+ */
+enum hrtimer_restart kvmppc_decrementer_wakeup(struct hrtimer *timer)
+{
+	struct kvm_vcpu *vcpu;
+
+	vcpu = container_of(timer, struct kvm_vcpu, arch.dec_timer);
+	tasklet_schedule(&vcpu->arch.tasklet);
+
+	return HRTIMER_NORESTART;
+}
+
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
-	setup_timer(&vcpu->arch.dec_timer, kvmppc_decrementer_func,
-	            (unsigned long)vcpu);
+	hrtimer_init(&vcpu->arch.dec_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
+	tasklet_init(&vcpu->arch.tasklet, kvmppc_decrementer_func, (ulong)vcpu);
+	vcpu->arch.dec_timer.function = kvmppc_decrementer_wakeup;
 
 	return 0;
 }
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 27/27] Use hrtimers for the decrementer
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, bphilips, Olof Johansson

Following S390's good example we should use hrtimers for the decrementer too!
This patch converts the timer from the old mechanism to hrtimers.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/kvm_host.h |    6 ++++--
 arch/powerpc/kvm/emulate.c          |   18 +++++++++++-------
 arch/powerpc/kvm/powerpc.c          |   20 ++++++++++++++++++--
 3 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 2cff5fe..1201f62 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -21,7 +21,8 @@
 #define __POWERPC_KVM_HOST_H__
 
 #include <linux/mutex.h>
-#include <linux/timer.h>
+#include <linux/hrtimer.h>
+#include <linux/interrupt.h>
 #include <linux/types.h>
 #include <linux/kvm_types.h>
 #include <asm/kvm_asm.h>
@@ -250,7 +251,8 @@ struct kvm_vcpu_arch {
 
 	u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */
 
-	struct timer_list dec_timer;
+	struct hrtimer dec_timer;
+	struct tasklet_struct tasklet;
 	u64 dec_jiffies;
 	unsigned long pending_exceptions;
 
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 1ec5e07..4a9ac66 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -18,7 +18,7 @@
  */
 
 #include <linux/jiffies.h>
-#include <linux/timer.h>
+#include <linux/hrtimer.h>
 #include <linux/types.h>
 #include <linux/string.h>
 #include <linux/kvm_host.h>
@@ -79,12 +79,13 @@ static int kvmppc_dec_enabled(struct kvm_vcpu *vcpu)
 
 void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 {
-	unsigned long nr_jiffies;
+	unsigned long dec_nsec;
 
+	pr_debug("mtDEC: %x\n", vcpu->arch.dec);
 #ifdef CONFIG_PPC64
 	/* POWER4+ triggers a dec interrupt if the value is < 0 */
 	if (vcpu->arch.dec & 0x80000000) {
-		del_timer(&vcpu->arch.dec_timer);
+		hrtimer_try_to_cancel(&vcpu->arch.dec_timer);
 		kvmppc_core_queue_dec(vcpu);
 		return;
 	}
@@ -94,12 +95,15 @@ void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 		 * that's how we convert the guest DEC value to the number of
 		 * host ticks. */
 
+		hrtimer_try_to_cancel(&vcpu->arch.dec_timer);
+		dec_nsec = vcpu->arch.dec;
+		dec_nsec *= 1000;
+		dec_nsec /= tb_ticks_per_usec;
+		hrtimer_start(&vcpu->arch.dec_timer, ktime_set(0, dec_nsec),
+			      HRTIMER_MODE_REL);
 		vcpu->arch.dec_jiffies = get_tb();
-		nr_jiffies = vcpu->arch.dec / tb_ticks_per_jiffy;
-		mod_timer(&vcpu->arch.dec_timer,
-		          get_jiffies_64() + nr_jiffies);
 	} else {
-		del_timer(&vcpu->arch.dec_timer);
+		hrtimer_try_to_cancel(&vcpu->arch.dec_timer);
 	}
 }
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 4ae3490..4c582ed 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -23,6 +23,7 @@
 #include <linux/kvm_host.h>
 #include <linux/module.h>
 #include <linux/vmalloc.h>
+#include <linux/hrtimer.h>
 #include <linux/fs.h>
 #include <asm/cputable.h>
 #include <asm/uaccess.h>
@@ -209,10 +210,25 @@ static void kvmppc_decrementer_func(unsigned long data)
 	}
 }
 
+/*
+ * low level hrtimer wake routine. Because this runs in hardirq context
+ * we schedule a tasklet to do the real work.
+ */
+enum hrtimer_restart kvmppc_decrementer_wakeup(struct hrtimer *timer)
+{
+	struct kvm_vcpu *vcpu;
+
+	vcpu = container_of(timer, struct kvm_vcpu, arch.dec_timer);
+	tasklet_schedule(&vcpu->arch.tasklet);
+
+	return HRTIMER_NORESTART;
+}
+
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
-	setup_timer(&vcpu->arch.dec_timer, kvmppc_decrementer_func,
-	            (unsigned long)vcpu);
+	hrtimer_init(&vcpu->arch.dec_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
+	tasklet_init(&vcpu->arch.tasklet, kvmppc_decrementer_func, (ulong)vcpu);
+	vcpu->arch.dec_timer.function = kvmppc_decrementer_wakeup;
 
 	return 0;
 }
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 244+ messages in thread

* [PATCH 27/27] Use hrtimers for the decrementer
@ 2009-10-30 15:47     ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-30 15:47 UTC (permalink / raw)
  To: kvm-u79uwXL29TY76Z2rM5mHXA
  Cc: Avi Kivity, kvm-ppc, Hollis Blanchard, Arnd Bergmann,
	Benjamin Herrenschmidt, Kevin Wolf, bphilips-l3A5Bk7waGM,
	Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

Following S390's good example we should use hrtimers for the decrementer too!
This patch converts the timer from the old mechanism to hrtimers.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/include/asm/kvm_host.h |    6 ++++--
 arch/powerpc/kvm/emulate.c          |   18 +++++++++++-------
 arch/powerpc/kvm/powerpc.c          |   20 ++++++++++++++++++--
 3 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 2cff5fe..1201f62 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -21,7 +21,8 @@
 #define __POWERPC_KVM_HOST_H__
 
 #include <linux/mutex.h>
-#include <linux/timer.h>
+#include <linux/hrtimer.h>
+#include <linux/interrupt.h>
 #include <linux/types.h>
 #include <linux/kvm_types.h>
 #include <asm/kvm_asm.h>
@@ -250,7 +251,8 @@ struct kvm_vcpu_arch {
 
 	u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */
 
-	struct timer_list dec_timer;
+	struct hrtimer dec_timer;
+	struct tasklet_struct tasklet;
 	u64 dec_jiffies;
 	unsigned long pending_exceptions;
 
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 1ec5e07..4a9ac66 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -18,7 +18,7 @@
  */
 
 #include <linux/jiffies.h>
-#include <linux/timer.h>
+#include <linux/hrtimer.h>
 #include <linux/types.h>
 #include <linux/string.h>
 #include <linux/kvm_host.h>
@@ -79,12 +79,13 @@ static int kvmppc_dec_enabled(struct kvm_vcpu *vcpu)
 
 void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 {
-	unsigned long nr_jiffies;
+	unsigned long dec_nsec;
 
+	pr_debug("mtDEC: %x\n", vcpu->arch.dec);
 #ifdef CONFIG_PPC64
 	/* POWER4+ triggers a dec interrupt if the value is < 0 */
 	if (vcpu->arch.dec & 0x80000000) {
-		del_timer(&vcpu->arch.dec_timer);
+		hrtimer_try_to_cancel(&vcpu->arch.dec_timer);
 		kvmppc_core_queue_dec(vcpu);
 		return;
 	}
@@ -94,12 +95,15 @@ void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 		 * that's how we convert the guest DEC value to the number of
 		 * host ticks. */
 
+		hrtimer_try_to_cancel(&vcpu->arch.dec_timer);
+		dec_nsec = vcpu->arch.dec;
+		dec_nsec *= 1000;
+		dec_nsec /= tb_ticks_per_usec;
+		hrtimer_start(&vcpu->arch.dec_timer, ktime_set(0, dec_nsec),
+			      HRTIMER_MODE_REL);
 		vcpu->arch.dec_jiffies = get_tb();
-		nr_jiffies = vcpu->arch.dec / tb_ticks_per_jiffy;
-		mod_timer(&vcpu->arch.dec_timer,
-		          get_jiffies_64() + nr_jiffies);
 	} else {
-		del_timer(&vcpu->arch.dec_timer);
+		hrtimer_try_to_cancel(&vcpu->arch.dec_timer);
 	}
 }
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 4ae3490..4c582ed 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -23,6 +23,7 @@
 #include <linux/kvm_host.h>
 #include <linux/module.h>
 #include <linux/vmalloc.h>
+#include <linux/hrtimer.h>
 #include <linux/fs.h>
 #include <asm/cputable.h>
 #include <asm/uaccess.h>
@@ -209,10 +210,25 @@ static void kvmppc_decrementer_func(unsigned long data)
 	}
 }
 
+/*
+ * low level hrtimer wake routine. Because this runs in hardirq context
+ * we schedule a tasklet to do the real work.
+ */
+enum hrtimer_restart kvmppc_decrementer_wakeup(struct hrtimer *timer)
+{
+	struct kvm_vcpu *vcpu;
+
+	vcpu = container_of(timer, struct kvm_vcpu, arch.dec_timer);
+	tasklet_schedule(&vcpu->arch.tasklet);
+
+	return HRTIMER_NORESTART;
+}
+
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
-	setup_timer(&vcpu->arch.dec_timer, kvmppc_decrementer_func,
-	            (unsigned long)vcpu);
+	hrtimer_init(&vcpu->arch.dec_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
+	tasklet_init(&vcpu->arch.tasklet, kvmppc_decrementer_func, (ulong)vcpu);
+	vcpu->arch.dec_timer.function = kvmppc_decrementer_wakeup;
 
 	return 0;
 }
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 244+ messages in thread

* Re: [PATCH 19/27] Export symbols for KVM module
  2009-10-30 15:47     ` Alexander Graf
  (?)
@ 2009-10-31  4:37         ` Stephen Rothwell
  -1 siblings, 0 replies; 244+ messages in thread
From: Stephen Rothwell @ 2009-10-31  4:37 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Kevin Wolf, Arnd Bergmann,
	Hollis Blanchard, Marcelo Tosatti, kvm-ppc,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A, Avi Kivity,
	bphilips-l3A5Bk7waGM, Olof Johansson

[-- Attachment #1: Type: text/plain, Size: 882 bytes --]

Hi Alexander,

On Fri, 30 Oct 2009 16:47:19 +0100 Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org> wrote:
>
> diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
> index c8b27bb..baf778c 100644
> --- a/arch/powerpc/kernel/ppc_ksyms.c
> +++ b/arch/powerpc/kernel/ppc_ksyms.c
> @@ -163,11 +163,12 @@ EXPORT_SYMBOL(screen_info);
>  #ifdef CONFIG_PPC32
>  EXPORT_SYMBOL(timer_interrupt);
>  EXPORT_SYMBOL(irq_desc);
> -EXPORT_SYMBOL(tb_ticks_per_jiffy);
>  EXPORT_SYMBOL(cacheable_memcpy);
>  EXPORT_SYMBOL(cacheable_memzero);
>  #endif
>  
> +EXPORT_SYMBOL(tb_ticks_per_jiffy);

Since you are moving this anyway, how about moving it into
arch/powerpc/kernel/time.c where tb_ticks_per_jiffy is defined.

-- 
Cheers,
Stephen Rothwell                    sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 19/27] Export symbols for KVM module
@ 2009-10-31  4:37         ` Stephen Rothwell
  0 siblings, 0 replies; 244+ messages in thread
From: Stephen Rothwell @ 2009-10-31  4:37 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson

[-- Attachment #1: Type: text/plain, Size: 838 bytes --]

Hi Alexander,

On Fri, 30 Oct 2009 16:47:19 +0100 Alexander Graf <agraf@suse.de> wrote:
>
> diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
> index c8b27bb..baf778c 100644
> --- a/arch/powerpc/kernel/ppc_ksyms.c
> +++ b/arch/powerpc/kernel/ppc_ksyms.c
> @@ -163,11 +163,12 @@ EXPORT_SYMBOL(screen_info);
>  #ifdef CONFIG_PPC32
>  EXPORT_SYMBOL(timer_interrupt);
>  EXPORT_SYMBOL(irq_desc);
> -EXPORT_SYMBOL(tb_ticks_per_jiffy);
>  EXPORT_SYMBOL(cacheable_memcpy);
>  EXPORT_SYMBOL(cacheable_memzero);
>  #endif
>  
> +EXPORT_SYMBOL(tb_ticks_per_jiffy);

Since you are moving this anyway, how about moving it into
arch/powerpc/kernel/time.c where tb_ticks_per_jiffy is defined.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 19/27] Export symbols for KVM module
@ 2009-10-31  4:37         ` Stephen Rothwell
  0 siblings, 0 replies; 244+ messages in thread
From: Stephen Rothwell @ 2009-10-31  4:37 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Kevin Wolf, Arnd Bergmann,
	Hollis Blanchard, Marcelo Tosatti, kvm-ppc,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A, Avi Kivity,
	bphilips-l3A5Bk7waGM, Olof Johansson

[-- Attachment #1: Type: text/plain, Size: 838 bytes --]

Hi Alexander,

On Fri, 30 Oct 2009 16:47:19 +0100 Alexander Graf <agraf@suse.de> wrote:
>
> diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
> index c8b27bb..baf778c 100644
> --- a/arch/powerpc/kernel/ppc_ksyms.c
> +++ b/arch/powerpc/kernel/ppc_ksyms.c
> @@ -163,11 +163,12 @@ EXPORT_SYMBOL(screen_info);
>  #ifdef CONFIG_PPC32
>  EXPORT_SYMBOL(timer_interrupt);
>  EXPORT_SYMBOL(irq_desc);
> -EXPORT_SYMBOL(tb_ticks_per_jiffy);
>  EXPORT_SYMBOL(cacheable_memcpy);
>  EXPORT_SYMBOL(cacheable_memzero);
>  #endif
>  
> +EXPORT_SYMBOL(tb_ticks_per_jiffy);

Since you are moving this anyway, how about moving it into
arch/powerpc/kernel/time.c where tb_ticks_per_jiffy is defined.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 20/27] Split init_new_context and destroy_context
  2009-10-30 15:47     ` Alexander Graf
  (?)
@ 2009-10-31  4:40       ` Stephen Rothwell
  -1 siblings, 0 replies; 244+ messages in thread
From: Stephen Rothwell @ 2009-10-31  4:40 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm, Kevin Wolf, Arnd Bergmann, Hollis Blanchard,
	Marcelo Tosatti, kvm-ppc, linuxppc-dev, Avi Kivity, bphilips,
	Olof Johansson

[-- Attachment #1: Type: text/plain, Size: 668 bytes --]

Hi Alexander,

On Fri, 30 Oct 2009 16:47:20 +0100 Alexander Graf <agraf@suse.de> wrote:
>
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -23,6 +23,11 @@ extern void switch_slb(struct task_struct *tsk, struct mm_struct *mm);
>  extern void set_context(unsigned long id, pgd_t *pgd);
>  
>  #ifdef CONFIG_PPC_BOOK3S_64
> +extern int __init_new_context(void);
> +extern void __destroy_context(int context_id);
> +#endif
> +
> +#ifdef CONFIG_PPC_BOOK3S_64

don't add the #endif/#ifdef pair ...

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 20/27] Split init_new_context and destroy_context
@ 2009-10-31  4:40       ` Stephen Rothwell
  0 siblings, 0 replies; 244+ messages in thread
From: Stephen Rothwell @ 2009-10-31  4:40 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson

[-- Attachment #1: Type: text/plain, Size: 668 bytes --]

Hi Alexander,

On Fri, 30 Oct 2009 16:47:20 +0100 Alexander Graf <agraf@suse.de> wrote:
>
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -23,6 +23,11 @@ extern void switch_slb(struct task_struct *tsk, struct mm_struct *mm);
>  extern void set_context(unsigned long id, pgd_t *pgd);
>  
>  #ifdef CONFIG_PPC_BOOK3S_64
> +extern int __init_new_context(void);
> +extern void __destroy_context(int context_id);
> +#endif
> +
> +#ifdef CONFIG_PPC_BOOK3S_64

don't add the #endif/#ifdef pair ...

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 20/27] Split init_new_context and destroy_context
@ 2009-10-31  4:40       ` Stephen Rothwell
  0 siblings, 0 replies; 244+ messages in thread
From: Stephen Rothwell @ 2009-10-31  4:40 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm, Kevin Wolf, Arnd Bergmann, Hollis Blanchard,
	Marcelo Tosatti, kvm-ppc, linuxppc-dev, Avi Kivity, bphilips,
	Olof Johansson

[-- Attachment #1: Type: text/plain, Size: 668 bytes --]

Hi Alexander,

On Fri, 30 Oct 2009 16:47:20 +0100 Alexander Graf <agraf@suse.de> wrote:
>
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -23,6 +23,11 @@ extern void switch_slb(struct task_struct *tsk, struct mm_struct *mm);
>  extern void set_context(unsigned long id, pgd_t *pgd);
>  
>  #ifdef CONFIG_PPC_BOOK3S_64
> +extern int __init_new_context(void);
> +extern void __destroy_context(int context_id);
> +#endif
> +
> +#ifdef CONFIG_PPC_BOOK3S_64

don't add the #endif/#ifdef pair ...

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 19/27] Export symbols for KVM module
  2009-10-31  4:37         ` Stephen Rothwell
  (?)
@ 2009-10-31 12:02             ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-31 12:02 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Kevin Wolf, Arnd Bergmann,
	Hollis Blanchard, Marcelo Tosatti, kvm-ppc,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A, Avi Kivity,
	bphilips-l3A5Bk7waGM, Olof Johansson


Am 31.10.2009 um 05:37 schrieb Stephen Rothwell <sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org>:

> Hi Alexander,
>
> On Fri, 30 Oct 2009 16:47:19 +0100 Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>  
> wrote:
>>
>> diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ 
>> ppc_ksyms.c
>> index c8b27bb..baf778c 100644
>> --- a/arch/powerpc/kernel/ppc_ksyms.c
>> +++ b/arch/powerpc/kernel/ppc_ksyms.c
>> @@ -163,11 +163,12 @@ EXPORT_SYMBOL(screen_info);
>> #ifdef CONFIG_PPC32
>> EXPORT_SYMBOL(timer_interrupt);
>> EXPORT_SYMBOL(irq_desc);
>> -EXPORT_SYMBOL(tb_ticks_per_jiffy);
>> EXPORT_SYMBOL(cacheable_memcpy);
>> EXPORT_SYMBOL(cacheable_memzero);
>> #endif
>>
>> +EXPORT_SYMBOL(tb_ticks_per_jiffy);
>
> Since you are moving this anyway, how about moving it into
> arch/powerpc/kernel/time.c where tb_ticks_per_jiffy is defined.

Well the fun part is that the hrtimer conversion patch actually  
deprecates this change.

I merely forgot to change the export back.

So I suppose I'll leave things here as is and then revert this chunk  
in the hrtimer patch.

Alex

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 19/27] Export symbols for KVM module
@ 2009-10-31 12:02             ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-31 12:02 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson


Am 31.10.2009 um 05:37 schrieb Stephen Rothwell <sfr@canb.auug.org.au>:

> Hi Alexander,
>
> On Fri, 30 Oct 2009 16:47:19 +0100 Alexander Graf <agraf@suse.de>  
> wrote:
>>
>> diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ 
>> ppc_ksyms.c
>> index c8b27bb..baf778c 100644
>> --- a/arch/powerpc/kernel/ppc_ksyms.c
>> +++ b/arch/powerpc/kernel/ppc_ksyms.c
>> @@ -163,11 +163,12 @@ EXPORT_SYMBOL(screen_info);
>> #ifdef CONFIG_PPC32
>> EXPORT_SYMBOL(timer_interrupt);
>> EXPORT_SYMBOL(irq_desc);
>> -EXPORT_SYMBOL(tb_ticks_per_jiffy);
>> EXPORT_SYMBOL(cacheable_memcpy);
>> EXPORT_SYMBOL(cacheable_memzero);
>> #endif
>>
>> +EXPORT_SYMBOL(tb_ticks_per_jiffy);
>
> Since you are moving this anyway, how about moving it into
> arch/powerpc/kernel/time.c where tb_ticks_per_jiffy is defined.

Well the fun part is that the hrtimer conversion patch actually  
deprecates this change.

I merely forgot to change the export back.

So I suppose I'll leave things here as is and then revert this chunk  
in the hrtimer patch.

Alex

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 19/27] Export symbols for KVM module
@ 2009-10-31 12:02             ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-31 12:02 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Kevin Wolf, Arnd Bergmann,
	Hollis Blanchard, Marcelo Tosatti, kvm-ppc,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A, Avi Kivity,
	bphilips-l3A5Bk7waGM, Olof Johansson


Am 31.10.2009 um 05:37 schrieb Stephen Rothwell <sfr@canb.auug.org.au>:

> Hi Alexander,
>
> On Fri, 30 Oct 2009 16:47:19 +0100 Alexander Graf <agraf@suse.de>  
> wrote:
>>
>> diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ 
>> ppc_ksyms.c
>> index c8b27bb..baf778c 100644
>> --- a/arch/powerpc/kernel/ppc_ksyms.c
>> +++ b/arch/powerpc/kernel/ppc_ksyms.c
>> @@ -163,11 +163,12 @@ EXPORT_SYMBOL(screen_info);
>> #ifdef CONFIG_PPC32
>> EXPORT_SYMBOL(timer_interrupt);
>> EXPORT_SYMBOL(irq_desc);
>> -EXPORT_SYMBOL(tb_ticks_per_jiffy);
>> EXPORT_SYMBOL(cacheable_memcpy);
>> EXPORT_SYMBOL(cacheable_memzero);
>> #endif
>>
>> +EXPORT_SYMBOL(tb_ticks_per_jiffy);
>
> Since you are moving this anyway, how about moving it into
> arch/powerpc/kernel/time.c where tb_ticks_per_jiffy is defined.

Well the fun part is that the hrtimer conversion patch actually  
deprecates this change.

I merely forgot to change the export back.

So I suppose I'll leave things here as is and then revert this chunk  
in the hrtimer patch.

Alex

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 20/27] Split init_new_context and destroy_context
  2009-10-31  4:40       ` Stephen Rothwell
  (?)
@ 2009-10-31 21:20         ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-31 21:20 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: kvm, Kevin Wolf, Arnd Bergmann, Hollis Blanchard,
	Marcelo Tosatti, kvm-ppc, linuxppc-dev, Avi Kivity, bphilips,
	Olof Johansson

Stephen Rothwell wrote:
> Hi Alexander,
>
> On Fri, 30 Oct 2009 16:47:20 +0100 Alexander Graf <agraf@suse.de> wrote:
>   
>> --- a/arch/powerpc/include/asm/mmu_context.h
>> +++ b/arch/powerpc/include/asm/mmu_context.h
>> @@ -23,6 +23,11 @@ extern void switch_slb(struct task_struct *tsk, struct mm_struct *mm);
>>  extern void set_context(unsigned long id, pgd_t *pgd);
>>  
>>  #ifdef CONFIG_PPC_BOOK3S_64
>> +extern int __init_new_context(void);
>> +extern void __destroy_context(int context_id);
>> +#endif
>> +
>> +#ifdef CONFIG_PPC_BOOK3S_64
>>     
>
> don't add the #endif/#ifdef pair ...

Any other comments? I'd like to wind up a final patch set so I can stop
spamming on all those MLs :-).

Alex

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 20/27] Split init_new_context and destroy_context
@ 2009-10-31 21:20         ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-31 21:20 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson

Stephen Rothwell wrote:
> Hi Alexander,
>
> On Fri, 30 Oct 2009 16:47:20 +0100 Alexander Graf <agraf@suse.de> wrote:
>   
>> --- a/arch/powerpc/include/asm/mmu_context.h
>> +++ b/arch/powerpc/include/asm/mmu_context.h
>> @@ -23,6 +23,11 @@ extern void switch_slb(struct task_struct *tsk, struct mm_struct *mm);
>>  extern void set_context(unsigned long id, pgd_t *pgd);
>>  
>>  #ifdef CONFIG_PPC_BOOK3S_64
>> +extern int __init_new_context(void);
>> +extern void __destroy_context(int context_id);
>> +#endif
>> +
>> +#ifdef CONFIG_PPC_BOOK3S_64
>>     
>
> don't add the #endif/#ifdef pair ...

Any other comments? I'd like to wind up a final patch set so I can stop
spamming on all those MLs :-).

Alex

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 20/27] Split init_new_context and destroy_context
@ 2009-10-31 21:20         ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-10-31 21:20 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: kvm, Kevin Wolf, Arnd Bergmann, Hollis Blanchard,
	Marcelo Tosatti, kvm-ppc, linuxppc-dev, Avi Kivity, bphilips,
	Olof Johansson

Stephen Rothwell wrote:
> Hi Alexander,
>
> On Fri, 30 Oct 2009 16:47:20 +0100 Alexander Graf <agraf@suse.de> wrote:
>   
>> --- a/arch/powerpc/include/asm/mmu_context.h
>> +++ b/arch/powerpc/include/asm/mmu_context.h
>> @@ -23,6 +23,11 @@ extern void switch_slb(struct task_struct *tsk, struct mm_struct *mm);
>>  extern void set_context(unsigned long id, pgd_t *pgd);
>>  
>>  #ifdef CONFIG_PPC_BOOK3S_64
>> +extern int __init_new_context(void);
>> +extern void __destroy_context(int context_id);
>> +#endif
>> +
>> +#ifdef CONFIG_PPC_BOOK3S_64
>>     
>
> don't add the #endif/#ifdef pair ...

Any other comments? I'd like to wind up a final patch set so I can stop
spamming on all those MLs :-).

Alex

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 20/27] Split init_new_context and destroy_context
  2009-10-31 21:20         ` Alexander Graf
@ 2009-10-31 21:37           ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-31 21:37 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Kevin Wolf, Stephen Rothwell, Arnd Bergmann, Hollis Blanchard,
	Marcelo Tosatti, kvm-ppc, linuxppc-dev, Avi Kivity, kvm,
	bphilips, Olof Johansson

On Sat, 2009-10-31 at 22:20 +0100, Alexander Graf wrote:
> Stephen Rothwell wrote:
> > Hi Alexander,
> >
> > On Fri, 30 Oct 2009 16:47:20 +0100 Alexander Graf <agraf@suse.de> wrote:
> >   
> >> --- a/arch/powerpc/include/asm/mmu_context.h
> >> +++ b/arch/powerpc/include/asm/mmu_context.h
> >> @@ -23,6 +23,11 @@ extern void switch_slb(struct task_struct *tsk, struct mm_struct *mm);
> >>  extern void set_context(unsigned long id, pgd_t *pgd);
> >>  
> >>  #ifdef CONFIG_PPC_BOOK3S_64
> >> +extern int __init_new_context(void);
> >> +extern void __destroy_context(int context_id);
> >> +#endif
> >> +
> >> +#ifdef CONFIG_PPC_BOOK3S_64
> >>     
> >
> > don't add the #endif/#ifdef pair ...
> 
> Any other comments? I'd like to wind up a final patch set so I can stop
> spamming on all those MLs :-).

Just send an update for -that- patch to the list.

Ben.

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 20/27] Split init_new_context and destroy_context
@ 2009-10-31 21:37           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-31 21:37 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Kevin Wolf, Stephen Rothwell, Arnd Bergmann, Hollis Blanchard,
	Marcelo Tosatti, kvm-ppc, linuxppc-dev, Avi Kivity, kvm,
	bphilips, Olof Johansson

On Sat, 2009-10-31 at 22:20 +0100, Alexander Graf wrote:
> Stephen Rothwell wrote:
> > Hi Alexander,
> >
> > On Fri, 30 Oct 2009 16:47:20 +0100 Alexander Graf <agraf@suse.de> wrote:
> >   
> >> --- a/arch/powerpc/include/asm/mmu_context.h
> >> +++ b/arch/powerpc/include/asm/mmu_context.h
> >> @@ -23,6 +23,11 @@ extern void switch_slb(struct task_struct *tsk, struct mm_struct *mm);
> >>  extern void set_context(unsigned long id, pgd_t *pgd);
> >>  
> >>  #ifdef CONFIG_PPC_BOOK3S_64
> >> +extern int __init_new_context(void);
> >> +extern void __destroy_context(int context_id);
> >> +#endif
> >> +
> >> +#ifdef CONFIG_PPC_BOOK3S_64
> >>     
> >
> > don't add the #endif/#ifdef pair ...
> 
> Any other comments? I'd like to wind up a final patch set so I can stop
> spamming on all those MLs :-).

Just send an update for -that- patch to the list.

Ben.



^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 08/27] Add SLB switching code for entry/exit
  2009-10-30 15:47     ` Alexander Graf
  (?)
@ 2009-11-01 23:23       ` Michael Neuling
  -1 siblings, 0 replies; 244+ messages in thread
From: Michael Neuling @ 2009-11-01 23:23 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm, Kevin Wolf, Arnd Bergmann, Hollis Blanchard,
	Marcelo Tosatti, kvm-ppc, linuxppc-dev, Avi Kivity, bphilips,
	Olof Johansson

> This is the really low level of guest entry/exit code.
> 
> Book3s_64 has an SLB, which stores all ESID -> VSID mappings we're
> currently aware of.
> 
> The segments in the guest differ from the ones on the host, so we need
> to switch the SLB to tell the MMU that we're in a new context.
> 
> So we store a shadow of the guest's SLB in the PACA, switch to that on
> entry and only restore bolted entries on exit, leaving the rest to the
> Linux SLB fault handler.
> 
> That way we get a really clean way of switching the SLB.
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>
> ---
>  arch/powerpc/kvm/book3s_64_slb.S |  277 ++++++++++++++++++++++++++++++++++++
++
>  1 files changed, 277 insertions(+), 0 deletions(-)
>  create mode 100644 arch/powerpc/kvm/book3s_64_slb.S
> 
> diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/book3s_64_sl
b.S
> new file mode 100644
> index 0000000..00a8367
> --- /dev/null
> +++ b/arch/powerpc/kvm/book3s_64_slb.S
> @@ -0,0 +1,277 @@
> +/*
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + *
> + * Copyright SUSE Linux Products GmbH 2009
> + *
> + * Authors: Alexander Graf <agraf@suse.de>
> + */
> +
> +/***************************************************************************
***
> + *                                                                          
  *
> + *                               Entry code                                 
  *
> + *                                                                          
  *
> + ***************************************************************************
**/
> +
> +.global kvmppc_handler_trampoline_enter
> +kvmppc_handler_trampoline_enter:
> +
> +	/* Required state:
> +	 *
> +	 * MSR = ~IR|DR
> +	 * R13 = PACA
> +	 * R9 = guest IP
> +	 * R10 = guest MSR
> +	 * R11 = free
> +	 * R12 = free
> +	 * PACA[PACA_EXMC + EX_R9] = guest R9
> +	 * PACA[PACA_EXMC + EX_R10] = guest R10
> +	 * PACA[PACA_EXMC + EX_R11] = guest R11
> +	 * PACA[PACA_EXMC + EX_R12] = guest R12
> +	 * PACA[PACA_EXMC + EX_R13] = guest R13
> +	 * PACA[PACA_EXMC + EX_CCR] = guest CR
> +	 * PACA[PACA_EXMC + EX_R3] = guest XER
> +	 */
> +
> +	mtsrr0	r9
> +	mtsrr1	r10
> +
> +	mtspr	SPRN_SPRG_SCRATCH0, r0
> +
> +	/* Remove LPAR shadow entries */
> +
> +#if SLB_NUM_BOLTED == 3

You could alternatively check the persistent entry in the slb_shawdow
buffer.  This would give you a run time check.  Not sure what's best
though.  

> +
> +	ld	r12, PACA_SLBSHADOWPTR(r13)
> +	ld	r10, 0x10(r12)
> +	ld	r11, 0x18(r12)

Can you define something in asm-offsets.c for these magic constants 0x10
and 0x18.  Similarly below.

> +	/* Invalid? Skip. */
> +	rldicl. r0, r10, 37, 63
> +	beq	slb_entry_skip_1
> +	xoris	r9, r10, SLB_ESID_V@h
> +	std	r9, 0x10(r12)
> +slb_entry_skip_1:
> +	ld	r9, 0x20(r12)
> +	/* Invalid? Skip. */
> +	rldicl. r0, r9, 37, 63
> +	beq	slb_entry_skip_2
> +	xoris	r9, r9, SLB_ESID_V@h
> +	std	r9, 0x20(r12)
> +slb_entry_skip_2:
> +	ld	r9, 0x30(r12)
> +	/* Invalid? Skip. */
> +	rldicl. r0, r9, 37, 63
> +	beq	slb_entry_skip_3
> +	xoris	r9, r9, SLB_ESID_V@h
> +	std	r9, 0x30(r12)

Can these 3 be made into a macro?

> +slb_entry_skip_3:
> +	
> +#else
> +#error unknown number of bolted entries
> +#endif
> +
> +	/* Flush SLB */
> +
> +	slbia
> +
> +	/* r0 = esid & ESID_MASK */
> +	rldicr  r10, r10, 0, 35
> +	/* r0 |= CLASS_BIT(VSID) */
> +	rldic   r12, r11, 56 - 36, 36
> +	or      r10, r10, r12
> +	slbie	r10
> +
> +	isync
> +
> +	/* Fill SLB with our shadow */
> +
> +	lbz	r12, PACA_KVM_SLB_MAX(r13)
> +	mulli	r12, r12, 16
> +	addi	r12, r12, PACA_KVM_SLB
> +	add	r12, r12, r13
> +
> +	/* for (r11 = kvm_slb; r11 < kvm_slb + kvm_slb_size; r11+=slb_entry) */
> +	li	r11, PACA_KVM_SLB
> +	add	r11, r11, r13
> +
> +slb_loop_enter:
> +
> +	ld	r10, 0(r11)
> +
> +	rldicl. r0, r10, 37, 63
> +	beq	slb_loop_enter_skip
> +
> +	ld	r9, 8(r11)
> +	slbmte	r9, r10

If you're updating the first 3 slbs, you need to make sure the slb
shadow is updated at the same time (BTW dumb question: can we run this
under PHYP?)

> +
> +slb_loop_enter_skip:
> +	addi	r11, r11, 16
> +	cmpd	cr0, r11, r12
> +	blt	slb_loop_enter
> +
> +slb_do_enter:
> +
> +	/* Enter guest */
> +
> +	mfspr	r0, SPRN_SPRG_SCRATCH0
> +
> +	ld	r9, (PACA_EXMC+EX_R9)(r13)
> +	ld	r10, (PACA_EXMC+EX_R10)(r13)
> +	ld	r12, (PACA_EXMC+EX_R12)(r13)
> +
> +	lwz	r11, (PACA_EXMC+EX_CCR)(r13)
> +	mtcr	r11
> +
> +	ld	r11, (PACA_EXMC+EX_R3)(r13)
> +	mtxer	r11
> +
> +	ld	r11, (PACA_EXMC+EX_R11)(r13)
> +	ld	r13, (PACA_EXMC+EX_R13)(r13)
> +
> +	RFI
> +kvmppc_handler_trampoline_enter_end:
> +
> +
> +
> +/***************************************************************************
***
> + *                                                                          
  *
> + *                               Exit code                                  
  *
> + *                                                                          
  *
> + ***************************************************************************
**/
> +
> +.global kvmppc_handler_trampoline_exit
> +kvmppc_handler_trampoline_exit:
> +
> +	/* Register usage at this point:
> +	 *
> +	 * SPRG_SCRATCH0 = guest R13
> +	 * R01           = host R1
> +	 * R02           = host R2
> +	 * R10           = guest PC
> +	 * R11           = guest MSR
> +	 * R12           = exit handler id
> +	 * R13           = PACA
> +	 * PACA.exmc.CCR  = guest CR
> +	 * PACA.exmc.R9  = guest R1
> +	 * PACA.exmc.R10 = guest R10
> +	 * PACA.exmc.R11 = guest R11
> +	 * PACA.exmc.R12 = guest R12
> +	 * PACA.exmc.R13 = guest R2
> +	 *
> +	 */
> +
> +	/* Save registers */
> +
> +	std	r0, (PACA_EXMC+EX_SRR0)(r13)
> +	std	r9, (PACA_EXMC+EX_R3)(r13)
> +	std	r10, (PACA_EXMC+EX_LR)(r13)
> +	std	r11, (PACA_EXMC+EX_DAR)(r13)
> +
> +	/*
> +	 * In order for us to easily get the last instruction,
> +	 * we got the #vmexit at, we exploit the fact that the
> +	 * virtual layout is still the same here, so we can just
> +	 * ld from the guest's PC address
> +	 */
> +
> +	/* We only load the last instruction when it's safe */
> +	cmpwi	r12, BOOK3S_INTERRUPT_DATA_STORAGE
> +	beq	ld_last_inst
> +	cmpwi	r12, BOOK3S_INTERRUPT_PROGRAM
> +	beq	ld_last_inst
> +
> +	b	no_ld_last_inst
> +
> +ld_last_inst:
> +	/* Save off the guest instruction we're at */
> +	/*    1) enable paging for data */
> +	mfmsr	r9
> +	ori	r11, r9, MSR_DR			/* Enable paging for data */
> +	mtmsr	r11
> +	/*    2) fetch the instruction */
> +	lwz	r0, 0(r10)
> +	/*    3) disable paging again */
> +	mtmsr	r9
> +
> +no_ld_last_inst:
> +
> +	/* Restore bolted entries from the shadow and fix it along the way */
> +
> +	/* We don't store anything in entry 0, so we don't need to take care of
 that */
> +	slbia
> +	isync
> +
> +#if SLB_NUM_BOLTED == 3
> +
> +	ld	r11, PACA_SLBSHADOWPTR(r13)
> +
> +	ld	r10, 0x10(r11)
> +	cmpdi	r10, 0
> +	beq	slb_exit_skip_1
> +	oris	r10, r10, SLB_ESID_V@h
> +	ld	r9, 0x18(r11)
> +	slbmte	r9, r10
> +	std	r10, 0x10(r11)
> +slb_exit_skip_1:
> +	
> +	ld	r10, 0x20(r11)
> +	cmpdi	r10, 0
> +	beq	slb_exit_skip_2
> +	oris	r10, r10, SLB_ESID_V@h
> +	ld	r9, 0x28(r11)
> +	slbmte	r9, r10
> +	std	r10, 0x20(r11)
> +slb_exit_skip_2:
> +	
> +	ld	r10, 0x30(r11)
> +	cmpdi	r10, 0
> +	beq	slb_exit_skip_3
> +	oris	r10, r10, SLB_ESID_V@h
> +	ld	r9, 0x38(r11)
> +	slbmte	r9, r10
> +	std	r10, 0x30(r11)
> +slb_exit_skip_3:

Again, a macro here?

> +	
> +#else
> +#error unknown number of bolted entries
> +#endif
> +
> +slb_do_exit:
> +
> +	/* Restore registers */
> +
> +	ld	r11, (PACA_EXMC+EX_DAR)(r13)
> +	ld	r10, (PACA_EXMC+EX_LR)(r13)
> +	ld	r9, (PACA_EXMC+EX_R3)(r13)
> +
> +	/* Save last inst */
> +	stw	r0, (PACA_EXMC+EX_LR)(r13)
> +
> +	/* Save DAR and DSISR before going to paged mode */
> +	mfdar	r0
> +	std	r0, (PACA_EXMC+EX_DAR)(r13)
> +	mfdsisr	r0
> +	stw	r0, (PACA_EXMC+EX_DSISR)(r13)
> +
> +	/* RFI into the highmem handler */
> +	mfmsr	r0
> +	ori	r0, r0, MSR_IR|MSR_DR|MSR_RI	/* Enable paging */
> +	mtsrr1	r0
> +	ld	r0, PACASAVEDMSR(r13)		/* Highmem handler address */
> +	mtsrr0	r0
> +
> +	mfspr	r0, SPRN_SPRG_SCRATCH0
> +
> +	RFI
> +kvmppc_handler_trampoline_exit_end:
> +
> -- 
> 1.6.0.2
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 08/27] Add SLB switching code for entry/exit
@ 2009-11-01 23:23       ` Michael Neuling
  0 siblings, 0 replies; 244+ messages in thread
From: Michael Neuling @ 2009-11-01 23:23 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson

> This is the really low level of guest entry/exit code.
> 
> Book3s_64 has an SLB, which stores all ESID -> VSID mappings we're
> currently aware of.
> 
> The segments in the guest differ from the ones on the host, so we need
> to switch the SLB to tell the MMU that we're in a new context.
> 
> So we store a shadow of the guest's SLB in the PACA, switch to that on
> entry and only restore bolted entries on exit, leaving the rest to the
> Linux SLB fault handler.
> 
> That way we get a really clean way of switching the SLB.
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>
> ---
>  arch/powerpc/kvm/book3s_64_slb.S |  277 ++++++++++++++++++++++++++++++++++++
++
>  1 files changed, 277 insertions(+), 0 deletions(-)
>  create mode 100644 arch/powerpc/kvm/book3s_64_slb.S
> 
> diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/book3s_64_sl
b.S
> new file mode 100644
> index 0000000..00a8367
> --- /dev/null
> +++ b/arch/powerpc/kvm/book3s_64_slb.S
> @@ -0,0 +1,277 @@
> +/*
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + *
> + * Copyright SUSE Linux Products GmbH 2009
> + *
> + * Authors: Alexander Graf <agraf@suse.de>
> + */
> +
> +/***************************************************************************
***
> + *                                                                          
  *
> + *                               Entry code                                 
  *
> + *                                                                          
  *
> + ***************************************************************************
**/
> +
> +.global kvmppc_handler_trampoline_enter
> +kvmppc_handler_trampoline_enter:
> +
> +	/* Required state:
> +	 *
> +	 * MSR = ~IR|DR
> +	 * R13 = PACA
> +	 * R9 = guest IP
> +	 * R10 = guest MSR
> +	 * R11 = free
> +	 * R12 = free
> +	 * PACA[PACA_EXMC + EX_R9] = guest R9
> +	 * PACA[PACA_EXMC + EX_R10] = guest R10
> +	 * PACA[PACA_EXMC + EX_R11] = guest R11
> +	 * PACA[PACA_EXMC + EX_R12] = guest R12
> +	 * PACA[PACA_EXMC + EX_R13] = guest R13
> +	 * PACA[PACA_EXMC + EX_CCR] = guest CR
> +	 * PACA[PACA_EXMC + EX_R3] = guest XER
> +	 */
> +
> +	mtsrr0	r9
> +	mtsrr1	r10
> +
> +	mtspr	SPRN_SPRG_SCRATCH0, r0
> +
> +	/* Remove LPAR shadow entries */
> +
> +#if SLB_NUM_BOLTED == 3

You could alternatively check the persistent entry in the slb_shawdow
buffer.  This would give you a run time check.  Not sure what's best
though.  

> +
> +	ld	r12, PACA_SLBSHADOWPTR(r13)
> +	ld	r10, 0x10(r12)
> +	ld	r11, 0x18(r12)

Can you define something in asm-offsets.c for these magic constants 0x10
and 0x18.  Similarly below.

> +	/* Invalid? Skip. */
> +	rldicl. r0, r10, 37, 63
> +	beq	slb_entry_skip_1
> +	xoris	r9, r10, SLB_ESID_V@h
> +	std	r9, 0x10(r12)
> +slb_entry_skip_1:
> +	ld	r9, 0x20(r12)
> +	/* Invalid? Skip. */
> +	rldicl. r0, r9, 37, 63
> +	beq	slb_entry_skip_2
> +	xoris	r9, r9, SLB_ESID_V@h
> +	std	r9, 0x20(r12)
> +slb_entry_skip_2:
> +	ld	r9, 0x30(r12)
> +	/* Invalid? Skip. */
> +	rldicl. r0, r9, 37, 63
> +	beq	slb_entry_skip_3
> +	xoris	r9, r9, SLB_ESID_V@h
> +	std	r9, 0x30(r12)

Can these 3 be made into a macro?

> +slb_entry_skip_3:
> +	
> +#else
> +#error unknown number of bolted entries
> +#endif
> +
> +	/* Flush SLB */
> +
> +	slbia
> +
> +	/* r0 = esid & ESID_MASK */
> +	rldicr  r10, r10, 0, 35
> +	/* r0 |= CLASS_BIT(VSID) */
> +	rldic   r12, r11, 56 - 36, 36
> +	or      r10, r10, r12
> +	slbie	r10
> +
> +	isync
> +
> +	/* Fill SLB with our shadow */
> +
> +	lbz	r12, PACA_KVM_SLB_MAX(r13)
> +	mulli	r12, r12, 16
> +	addi	r12, r12, PACA_KVM_SLB
> +	add	r12, r12, r13
> +
> +	/* for (r11 = kvm_slb; r11 < kvm_slb + kvm_slb_size; r11+=slb_entry) */
> +	li	r11, PACA_KVM_SLB
> +	add	r11, r11, r13
> +
> +slb_loop_enter:
> +
> +	ld	r10, 0(r11)
> +
> +	rldicl. r0, r10, 37, 63
> +	beq	slb_loop_enter_skip
> +
> +	ld	r9, 8(r11)
> +	slbmte	r9, r10

If you're updating the first 3 slbs, you need to make sure the slb
shadow is updated at the same time (BTW dumb question: can we run this
under PHYP?)

> +
> +slb_loop_enter_skip:
> +	addi	r11, r11, 16
> +	cmpd	cr0, r11, r12
> +	blt	slb_loop_enter
> +
> +slb_do_enter:
> +
> +	/* Enter guest */
> +
> +	mfspr	r0, SPRN_SPRG_SCRATCH0
> +
> +	ld	r9, (PACA_EXMC+EX_R9)(r13)
> +	ld	r10, (PACA_EXMC+EX_R10)(r13)
> +	ld	r12, (PACA_EXMC+EX_R12)(r13)
> +
> +	lwz	r11, (PACA_EXMC+EX_CCR)(r13)
> +	mtcr	r11
> +
> +	ld	r11, (PACA_EXMC+EX_R3)(r13)
> +	mtxer	r11
> +
> +	ld	r11, (PACA_EXMC+EX_R11)(r13)
> +	ld	r13, (PACA_EXMC+EX_R13)(r13)
> +
> +	RFI
> +kvmppc_handler_trampoline_enter_end:
> +
> +
> +
> +/***************************************************************************
***
> + *                                                                          
  *
> + *                               Exit code                                  
  *
> + *                                                                          
  *
> + ***************************************************************************
**/
> +
> +.global kvmppc_handler_trampoline_exit
> +kvmppc_handler_trampoline_exit:
> +
> +	/* Register usage at this point:
> +	 *
> +	 * SPRG_SCRATCH0 = guest R13
> +	 * R01           = host R1
> +	 * R02           = host R2
> +	 * R10           = guest PC
> +	 * R11           = guest MSR
> +	 * R12           = exit handler id
> +	 * R13           = PACA
> +	 * PACA.exmc.CCR  = guest CR
> +	 * PACA.exmc.R9  = guest R1
> +	 * PACA.exmc.R10 = guest R10
> +	 * PACA.exmc.R11 = guest R11
> +	 * PACA.exmc.R12 = guest R12
> +	 * PACA.exmc.R13 = guest R2
> +	 *
> +	 */
> +
> +	/* Save registers */
> +
> +	std	r0, (PACA_EXMC+EX_SRR0)(r13)
> +	std	r9, (PACA_EXMC+EX_R3)(r13)
> +	std	r10, (PACA_EXMC+EX_LR)(r13)
> +	std	r11, (PACA_EXMC+EX_DAR)(r13)
> +
> +	/*
> +	 * In order for us to easily get the last instruction,
> +	 * we got the #vmexit at, we exploit the fact that the
> +	 * virtual layout is still the same here, so we can just
> +	 * ld from the guest's PC address
> +	 */
> +
> +	/* We only load the last instruction when it's safe */
> +	cmpwi	r12, BOOK3S_INTERRUPT_DATA_STORAGE
> +	beq	ld_last_inst
> +	cmpwi	r12, BOOK3S_INTERRUPT_PROGRAM
> +	beq	ld_last_inst
> +
> +	b	no_ld_last_inst
> +
> +ld_last_inst:
> +	/* Save off the guest instruction we're at */
> +	/*    1) enable paging for data */
> +	mfmsr	r9
> +	ori	r11, r9, MSR_DR			/* Enable paging for data */
> +	mtmsr	r11
> +	/*    2) fetch the instruction */
> +	lwz	r0, 0(r10)
> +	/*    3) disable paging again */
> +	mtmsr	r9
> +
> +no_ld_last_inst:
> +
> +	/* Restore bolted entries from the shadow and fix it along the way */
> +
> +	/* We don't store anything in entry 0, so we don't need to take care of
 that */
> +	slbia
> +	isync
> +
> +#if SLB_NUM_BOLTED == 3
> +
> +	ld	r11, PACA_SLBSHADOWPTR(r13)
> +
> +	ld	r10, 0x10(r11)
> +	cmpdi	r10, 0
> +	beq	slb_exit_skip_1
> +	oris	r10, r10, SLB_ESID_V@h
> +	ld	r9, 0x18(r11)
> +	slbmte	r9, r10
> +	std	r10, 0x10(r11)
> +slb_exit_skip_1:
> +	
> +	ld	r10, 0x20(r11)
> +	cmpdi	r10, 0
> +	beq	slb_exit_skip_2
> +	oris	r10, r10, SLB_ESID_V@h
> +	ld	r9, 0x28(r11)
> +	slbmte	r9, r10
> +	std	r10, 0x20(r11)
> +slb_exit_skip_2:
> +	
> +	ld	r10, 0x30(r11)
> +	cmpdi	r10, 0
> +	beq	slb_exit_skip_3
> +	oris	r10, r10, SLB_ESID_V@h
> +	ld	r9, 0x38(r11)
> +	slbmte	r9, r10
> +	std	r10, 0x30(r11)
> +slb_exit_skip_3:

Again, a macro here?

> +	
> +#else
> +#error unknown number of bolted entries
> +#endif
> +
> +slb_do_exit:
> +
> +	/* Restore registers */
> +
> +	ld	r11, (PACA_EXMC+EX_DAR)(r13)
> +	ld	r10, (PACA_EXMC+EX_LR)(r13)
> +	ld	r9, (PACA_EXMC+EX_R3)(r13)
> +
> +	/* Save last inst */
> +	stw	r0, (PACA_EXMC+EX_LR)(r13)
> +
> +	/* Save DAR and DSISR before going to paged mode */
> +	mfdar	r0
> +	std	r0, (PACA_EXMC+EX_DAR)(r13)
> +	mfdsisr	r0
> +	stw	r0, (PACA_EXMC+EX_DSISR)(r13)
> +
> +	/* RFI into the highmem handler */
> +	mfmsr	r0
> +	ori	r0, r0, MSR_IR|MSR_DR|MSR_RI	/* Enable paging */
> +	mtsrr1	r0
> +	ld	r0, PACASAVEDMSR(r13)		/* Highmem handler address */
> +	mtsrr0	r0
> +
> +	mfspr	r0, SPRN_SPRG_SCRATCH0
> +
> +	RFI
> +kvmppc_handler_trampoline_exit_end:
> +
> -- 
> 1.6.0.2
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 08/27] Add SLB switching code for entry/exit
@ 2009-11-01 23:23       ` Michael Neuling
  0 siblings, 0 replies; 244+ messages in thread
From: Michael Neuling @ 2009-11-01 23:23 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm, Kevin Wolf, Arnd Bergmann, Hollis Blanchard,
	Marcelo Tosatti, kvm-ppc, linuxppc-dev, Avi Kivity, bphilips,
	Olof Johansson

> This is the really low level of guest entry/exit code.
> 
> Book3s_64 has an SLB, which stores all ESID -> VSID mappings we're
> currently aware of.
> 
> The segments in the guest differ from the ones on the host, so we need
> to switch the SLB to tell the MMU that we're in a new context.
> 
> So we store a shadow of the guest's SLB in the PACA, switch to that on
> entry and only restore bolted entries on exit, leaving the rest to the
> Linux SLB fault handler.
> 
> That way we get a really clean way of switching the SLB.
> 
> Signed-off-by: Alexander Graf <agraf@suse.de>
> ---
>  arch/powerpc/kvm/book3s_64_slb.S |  277 ++++++++++++++++++++++++++++++++++++
++
>  1 files changed, 277 insertions(+), 0 deletions(-)
>  create mode 100644 arch/powerpc/kvm/book3s_64_slb.S
> 
> diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/book3s_64_sl
b.S
> new file mode 100644
> index 0000000..00a8367
> --- /dev/null
> +++ b/arch/powerpc/kvm/book3s_64_slb.S
> @@ -0,0 +1,277 @@
> +/*
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
> + *
> + * Copyright SUSE Linux Products GmbH 2009
> + *
> + * Authors: Alexander Graf <agraf@suse.de>
> + */
> +
> +/***************************************************************************
***
> + *                                                                          
  *
> + *                               Entry code                                 
  *
> + *                                                                          
  *
> + ***************************************************************************
**/
> +
> +.global kvmppc_handler_trampoline_enter
> +kvmppc_handler_trampoline_enter:
> +
> +	/* Required state:
> +	 *
> +	 * MSR = ~IR|DR
> +	 * R13 = PACA
> +	 * R9 = guest IP
> +	 * R10 = guest MSR
> +	 * R11 = free
> +	 * R12 = free
> +	 * PACA[PACA_EXMC + EX_R9] = guest R9
> +	 * PACA[PACA_EXMC + EX_R10] = guest R10
> +	 * PACA[PACA_EXMC + EX_R11] = guest R11
> +	 * PACA[PACA_EXMC + EX_R12] = guest R12
> +	 * PACA[PACA_EXMC + EX_R13] = guest R13
> +	 * PACA[PACA_EXMC + EX_CCR] = guest CR
> +	 * PACA[PACA_EXMC + EX_R3] = guest XER
> +	 */
> +
> +	mtsrr0	r9
> +	mtsrr1	r10
> +
> +	mtspr	SPRN_SPRG_SCRATCH0, r0
> +
> +	/* Remove LPAR shadow entries */
> +
> +#if SLB_NUM_BOLTED = 3

You could alternatively check the persistent entry in the slb_shawdow
buffer.  This would give you a run time check.  Not sure what's best
though.  

> +
> +	ld	r12, PACA_SLBSHADOWPTR(r13)
> +	ld	r10, 0x10(r12)
> +	ld	r11, 0x18(r12)

Can you define something in asm-offsets.c for these magic constants 0x10
and 0x18.  Similarly below.

> +	/* Invalid? Skip. */
> +	rldicl. r0, r10, 37, 63
> +	beq	slb_entry_skip_1
> +	xoris	r9, r10, SLB_ESID_V@h
> +	std	r9, 0x10(r12)
> +slb_entry_skip_1:
> +	ld	r9, 0x20(r12)
> +	/* Invalid? Skip. */
> +	rldicl. r0, r9, 37, 63
> +	beq	slb_entry_skip_2
> +	xoris	r9, r9, SLB_ESID_V@h
> +	std	r9, 0x20(r12)
> +slb_entry_skip_2:
> +	ld	r9, 0x30(r12)
> +	/* Invalid? Skip. */
> +	rldicl. r0, r9, 37, 63
> +	beq	slb_entry_skip_3
> +	xoris	r9, r9, SLB_ESID_V@h
> +	std	r9, 0x30(r12)

Can these 3 be made into a macro?

> +slb_entry_skip_3:
> +	
> +#else
> +#error unknown number of bolted entries
> +#endif
> +
> +	/* Flush SLB */
> +
> +	slbia
> +
> +	/* r0 = esid & ESID_MASK */
> +	rldicr  r10, r10, 0, 35
> +	/* r0 |= CLASS_BIT(VSID) */
> +	rldic   r12, r11, 56 - 36, 36
> +	or      r10, r10, r12
> +	slbie	r10
> +
> +	isync
> +
> +	/* Fill SLB with our shadow */
> +
> +	lbz	r12, PACA_KVM_SLB_MAX(r13)
> +	mulli	r12, r12, 16
> +	addi	r12, r12, PACA_KVM_SLB
> +	add	r12, r12, r13
> +
> +	/* for (r11 = kvm_slb; r11 < kvm_slb + kvm_slb_size; r11+=slb_entry) */
> +	li	r11, PACA_KVM_SLB
> +	add	r11, r11, r13
> +
> +slb_loop_enter:
> +
> +	ld	r10, 0(r11)
> +
> +	rldicl. r0, r10, 37, 63
> +	beq	slb_loop_enter_skip
> +
> +	ld	r9, 8(r11)
> +	slbmte	r9, r10

If you're updating the first 3 slbs, you need to make sure the slb
shadow is updated at the same time (BTW dumb question: can we run this
under PHYP?)

> +
> +slb_loop_enter_skip:
> +	addi	r11, r11, 16
> +	cmpd	cr0, r11, r12
> +	blt	slb_loop_enter
> +
> +slb_do_enter:
> +
> +	/* Enter guest */
> +
> +	mfspr	r0, SPRN_SPRG_SCRATCH0
> +
> +	ld	r9, (PACA_EXMC+EX_R9)(r13)
> +	ld	r10, (PACA_EXMC+EX_R10)(r13)
> +	ld	r12, (PACA_EXMC+EX_R12)(r13)
> +
> +	lwz	r11, (PACA_EXMC+EX_CCR)(r13)
> +	mtcr	r11
> +
> +	ld	r11, (PACA_EXMC+EX_R3)(r13)
> +	mtxer	r11
> +
> +	ld	r11, (PACA_EXMC+EX_R11)(r13)
> +	ld	r13, (PACA_EXMC+EX_R13)(r13)
> +
> +	RFI
> +kvmppc_handler_trampoline_enter_end:
> +
> +
> +
> +/***************************************************************************
***
> + *                                                                          
  *
> + *                               Exit code                                  
  *
> + *                                                                          
  *
> + ***************************************************************************
**/
> +
> +.global kvmppc_handler_trampoline_exit
> +kvmppc_handler_trampoline_exit:
> +
> +	/* Register usage at this point:
> +	 *
> +	 * SPRG_SCRATCH0 = guest R13
> +	 * R01           = host R1
> +	 * R02           = host R2
> +	 * R10           = guest PC
> +	 * R11           = guest MSR
> +	 * R12           = exit handler id
> +	 * R13           = PACA
> +	 * PACA.exmc.CCR  = guest CR
> +	 * PACA.exmc.R9  = guest R1
> +	 * PACA.exmc.R10 = guest R10
> +	 * PACA.exmc.R11 = guest R11
> +	 * PACA.exmc.R12 = guest R12
> +	 * PACA.exmc.R13 = guest R2
> +	 *
> +	 */
> +
> +	/* Save registers */
> +
> +	std	r0, (PACA_EXMC+EX_SRR0)(r13)
> +	std	r9, (PACA_EXMC+EX_R3)(r13)
> +	std	r10, (PACA_EXMC+EX_LR)(r13)
> +	std	r11, (PACA_EXMC+EX_DAR)(r13)
> +
> +	/*
> +	 * In order for us to easily get the last instruction,
> +	 * we got the #vmexit at, we exploit the fact that the
> +	 * virtual layout is still the same here, so we can just
> +	 * ld from the guest's PC address
> +	 */
> +
> +	/* We only load the last instruction when it's safe */
> +	cmpwi	r12, BOOK3S_INTERRUPT_DATA_STORAGE
> +	beq	ld_last_inst
> +	cmpwi	r12, BOOK3S_INTERRUPT_PROGRAM
> +	beq	ld_last_inst
> +
> +	b	no_ld_last_inst
> +
> +ld_last_inst:
> +	/* Save off the guest instruction we're at */
> +	/*    1) enable paging for data */
> +	mfmsr	r9
> +	ori	r11, r9, MSR_DR			/* Enable paging for data */
> +	mtmsr	r11
> +	/*    2) fetch the instruction */
> +	lwz	r0, 0(r10)
> +	/*    3) disable paging again */
> +	mtmsr	r9
> +
> +no_ld_last_inst:
> +
> +	/* Restore bolted entries from the shadow and fix it along the way */
> +
> +	/* We don't store anything in entry 0, so we don't need to take care of
 that */
> +	slbia
> +	isync
> +
> +#if SLB_NUM_BOLTED = 3
> +
> +	ld	r11, PACA_SLBSHADOWPTR(r13)
> +
> +	ld	r10, 0x10(r11)
> +	cmpdi	r10, 0
> +	beq	slb_exit_skip_1
> +	oris	r10, r10, SLB_ESID_V@h
> +	ld	r9, 0x18(r11)
> +	slbmte	r9, r10
> +	std	r10, 0x10(r11)
> +slb_exit_skip_1:
> +	
> +	ld	r10, 0x20(r11)
> +	cmpdi	r10, 0
> +	beq	slb_exit_skip_2
> +	oris	r10, r10, SLB_ESID_V@h
> +	ld	r9, 0x28(r11)
> +	slbmte	r9, r10
> +	std	r10, 0x20(r11)
> +slb_exit_skip_2:
> +	
> +	ld	r10, 0x30(r11)
> +	cmpdi	r10, 0
> +	beq	slb_exit_skip_3
> +	oris	r10, r10, SLB_ESID_V@h
> +	ld	r9, 0x38(r11)
> +	slbmte	r9, r10
> +	std	r10, 0x30(r11)
> +slb_exit_skip_3:

Again, a macro here?

> +	
> +#else
> +#error unknown number of bolted entries
> +#endif
> +
> +slb_do_exit:
> +
> +	/* Restore registers */
> +
> +	ld	r11, (PACA_EXMC+EX_DAR)(r13)
> +	ld	r10, (PACA_EXMC+EX_LR)(r13)
> +	ld	r9, (PACA_EXMC+EX_R3)(r13)
> +
> +	/* Save last inst */
> +	stw	r0, (PACA_EXMC+EX_LR)(r13)
> +
> +	/* Save DAR and DSISR before going to paged mode */
> +	mfdar	r0
> +	std	r0, (PACA_EXMC+EX_DAR)(r13)
> +	mfdsisr	r0
> +	stw	r0, (PACA_EXMC+EX_DSISR)(r13)
> +
> +	/* RFI into the highmem handler */
> +	mfmsr	r0
> +	ori	r0, r0, MSR_IR|MSR_DR|MSR_RI	/* Enable paging */
> +	mtsrr1	r0
> +	ld	r0, PACASAVEDMSR(r13)		/* Highmem handler address */
> +	mtsrr0	r0
> +
> +	mfspr	r0, SPRN_SPRG_SCRATCH0
> +
> +	RFI
> +kvmppc_handler_trampoline_exit_end:
> +
> -- 
> 1.6.0.2
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 11/27] Add book3s_64 Host MMU handling
  2009-10-30 15:47   ` Alexander Graf
  (?)
@ 2009-11-01 23:39       ` Michael Neuling
  -1 siblings, 0 replies; 244+ messages in thread
From: Michael Neuling @ 2009-11-01 23:39 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Kevin Wolf, Arnd Bergmann,
	Hollis Blanchard, Marcelo Tosatti, kvm-ppc,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A, Avi Kivity,
	bphilips-l3A5Bk7waGM, Olof Johansson

<snip>
> +static void invalidate_pte(struct hpte_cache *pte)
> +{
> +	dprintk_mmu("KVM: Flushing SPT %d: 0x%llx (0x%llx) -> 0x%llx\n",
> +		    i, pte->pte.eaddr, pte->pte.vpage, pte->host_va);
> +
> +	ppc_md.hpte_invalidate(pte->slot, pte->host_va,
> +			       MMU_PAGE_4K, MMU_SEGSIZE_256M,
> +			       false);

Are we assuming 256M segments here (and elsewhere)?

<snip>
> +static int kvmppc_mmu_next_segment(struct kvm_vcpu *vcpu, ulong esid)
> +{
> +	int i;
> +	int max_slb_size = 64;
> +	int found_inval = -1;
> +	int r;
> +
> +	if (!get_paca()->kvm_slb_max)
> +		get_paca()->kvm_slb_max = 1;
> +
> +	/* Are we overwriting? */
> +	for (i = 1; i < get_paca()->kvm_slb_max; i++) {
> +		if (!(get_paca()->kvm_slb[i].esid & SLB_ESID_V))
> +			found_inval = i;
> +		else if ((get_paca()->kvm_slb[i].esid & ESID_MASK) == esid)
> +			return i;
> +	}
> +
> +	/* Found a spare entry that was invalidated before */
> +	if (found_inval > 0)
> +		return found_inval;
> +
> +	/* No spare invalid entry, so create one */
> +
> +	if (mmu_slb_size < 64)
> +		max_slb_size = mmu_slb_size;

Can we just use the global mmu_slb_size eliminate max_slb_size?

<snip>

Mikey

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 11/27] Add book3s_64 Host MMU handling
@ 2009-11-01 23:39       ` Michael Neuling
  0 siblings, 0 replies; 244+ messages in thread
From: Michael Neuling @ 2009-11-01 23:39 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson

<snip>
> +static void invalidate_pte(struct hpte_cache *pte)
> +{
> +	dprintk_mmu("KVM: Flushing SPT %d: 0x%llx (0x%llx) -> 0x%llx\n",
> +		    i, pte->pte.eaddr, pte->pte.vpage, pte->host_va);
> +
> +	ppc_md.hpte_invalidate(pte->slot, pte->host_va,
> +			       MMU_PAGE_4K, MMU_SEGSIZE_256M,
> +			       false);

Are we assuming 256M segments here (and elsewhere)?

<snip>
> +static int kvmppc_mmu_next_segment(struct kvm_vcpu *vcpu, ulong esid)
> +{
> +	int i;
> +	int max_slb_size = 64;
> +	int found_inval = -1;
> +	int r;
> +
> +	if (!get_paca()->kvm_slb_max)
> +		get_paca()->kvm_slb_max = 1;
> +
> +	/* Are we overwriting? */
> +	for (i = 1; i < get_paca()->kvm_slb_max; i++) {
> +		if (!(get_paca()->kvm_slb[i].esid & SLB_ESID_V))
> +			found_inval = i;
> +		else if ((get_paca()->kvm_slb[i].esid & ESID_MASK) == esid)
> +			return i;
> +	}
> +
> +	/* Found a spare entry that was invalidated before */
> +	if (found_inval > 0)
> +		return found_inval;
> +
> +	/* No spare invalid entry, so create one */
> +
> +	if (mmu_slb_size < 64)
> +		max_slb_size = mmu_slb_size;

Can we just use the global mmu_slb_size eliminate max_slb_size?

<snip>

Mikey

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 11/27] Add book3s_64 Host MMU handling
@ 2009-11-01 23:39       ` Michael Neuling
  0 siblings, 0 replies; 244+ messages in thread
From: Michael Neuling @ 2009-11-01 23:39 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Kevin Wolf, Arnd Bergmann,
	Hollis Blanchard, Marcelo Tosatti, kvm-ppc,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A, Avi Kivity,
	bphilips-l3A5Bk7waGM, Olof Johansson

<snip>
> +static void invalidate_pte(struct hpte_cache *pte)
> +{
> +	dprintk_mmu("KVM: Flushing SPT %d: 0x%llx (0x%llx) -> 0x%llx\n",
> +		    i, pte->pte.eaddr, pte->pte.vpage, pte->host_va);
> +
> +	ppc_md.hpte_invalidate(pte->slot, pte->host_va,
> +			       MMU_PAGE_4K, MMU_SEGSIZE_256M,
> +			       false);

Are we assuming 256M segments here (and elsewhere)?

<snip>
> +static int kvmppc_mmu_next_segment(struct kvm_vcpu *vcpu, ulong esid)
> +{
> +	int i;
> +	int max_slb_size = 64;
> +	int found_inval = -1;
> +	int r;
> +
> +	if (!get_paca()->kvm_slb_max)
> +		get_paca()->kvm_slb_max = 1;
> +
> +	/* Are we overwriting? */
> +	for (i = 1; i < get_paca()->kvm_slb_max; i++) {
> +		if (!(get_paca()->kvm_slb[i].esid & SLB_ESID_V))
> +			found_inval = i;
> +		else if ((get_paca()->kvm_slb[i].esid & ESID_MASK) = esid)
> +			return i;
> +	}
> +
> +	/* Found a spare entry that was invalidated before */
> +	if (found_inval > 0)
> +		return found_inval;
> +
> +	/* No spare invalid entry, so create one */
> +
> +	if (mmu_slb_size < 64)
> +		max_slb_size = mmu_slb_size;

Can we just use the global mmu_slb_size eliminate max_slb_size?

<snip>

Mikey

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 08/27] Add SLB switching code for entry/exit
  2009-11-01 23:23       ` Michael Neuling
  (?)
@ 2009-11-02  9:23           ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-11-02  9:23 UTC (permalink / raw)
  To: Michael Neuling
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Kevin Wolf, Arnd Bergmann,
	Hollis Blanchard, Marcelo Tosatti, kvm-ppc,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A, Avi Kivity,
	bphilips-l3A5Bk7waGM, Olof Johansson


Am 02.11.2009 um 00:23 schrieb Michael Neuling <mikey-/owAOxkjmzZAfugRpC6u6w@public.gmane.org>:

>> This is the really low level of guest entry/exit code.
>>
>> Book3s_64 has an SLB, which stores all ESID -> VSID mappings we're
>> currently aware of.
>>
>> The segments in the guest differ from the ones on the host, so we  
>> need
>> to switch the SLB to tell the MMU that we're in a new context.
>>
>> So we store a shadow of the guest's SLB in the PACA, switch to that  
>> on
>> entry and only restore bolted entries on exit, leaving the rest to  
>> the
>> Linux SLB fault handler.
>>
>> That way we get a really clean way of switching the SLB.
>>
>> Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
>> ---
>> arch/powerpc/kvm/book3s_64_slb.S |  277 ++++++++++++++++++++++++++++ 
>> ++++++++
> ++
>> 1 files changed, 277 insertions(+), 0 deletions(-)
>> create mode 100644 arch/powerpc/kvm/book3s_64_slb.S
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/ 
>> book3s_64_sl
> b.S
>> new file mode 100644
>> index 0000000..00a8367
>> --- /dev/null
>> +++ b/arch/powerpc/kvm/book3s_64_slb.S
>> @@ -0,0 +1,277 @@
>> +/*
>> + * This program is free software; you can redistribute it and/or  
>> modify
>> + * it under the terms of the GNU General Public License, version  
>> 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA   
>> 02110-1301, USA.
>> + *
>> + * Copyright SUSE Linux Products GmbH 2009
>> + *
>> + * Authors: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
>> + */
>> +
>> +/ 
>> *** 
>> *** 
>> *********************************************************************
> ***
>> + *
>  *
>> + *                               Entry code
>  *
>> + *
>  *
>> +  
>> *** 
>> *** 
>> *********************************************************************
> **/
>> +
>> +.global kvmppc_handler_trampoline_enter
>> +kvmppc_handler_trampoline_enter:
>> +
>> +    /* Required state:
>> +     *
>> +     * MSR = ~IR|DR
>> +     * R13 = PACA
>> +     * R9 = guest IP
>> +     * R10 = guest MSR
>> +     * R11 = free
>> +     * R12 = free
>> +     * PACA[PACA_EXMC + EX_R9] = guest R9
>> +     * PACA[PACA_EXMC + EX_R10] = guest R10
>> +     * PACA[PACA_EXMC + EX_R11] = guest R11
>> +     * PACA[PACA_EXMC + EX_R12] = guest R12
>> +     * PACA[PACA_EXMC + EX_R13] = guest R13
>> +     * PACA[PACA_EXMC + EX_CCR] = guest CR
>> +     * PACA[PACA_EXMC + EX_R3] = guest XER
>> +     */
>> +
>> +    mtsrr0    r9
>> +    mtsrr1    r10
>> +
>> +    mtspr    SPRN_SPRG_SCRATCH0, r0
>> +
>> +    /* Remove LPAR shadow entries */
>> +
>> +#if SLB_NUM_BOLTED == 3
>
> You could alternatively check the persistent entry in the slb_shawdow
> buffer.  This would give you a run time check.  Not sure what's best
> though.

Well we're in the hot path here, so anything using as few registers as  
possible and being simple is the best :-). I'd guess the more we are  
clever at compile time the better.

>
>
>> +
>> +    ld    r12, PACA_SLBSHADOWPTR(r13)
>> +    ld    r10, 0x10(r12)
>> +    ld    r11, 0x18(r12)
>
> Can you define something in asm-offsets.c for these magic constants  
> 0x10
> and 0x18.  Similarly below.
>
>> +    /* Invalid? Skip. */
>> +    rldicl. r0, r10, 37, 63
>> +    beq    slb_entry_skip_1
>> +    xoris    r9, r10, SLB_ESID_V@h
>> +    std    r9, 0x10(r12)
>> +slb_entry_skip_1:
>> +    ld    r9, 0x20(r12)
>> +    /* Invalid? Skip. */
>> +    rldicl. r0, r9, 37, 63
>> +    beq    slb_entry_skip_2
>> +    xoris    r9, r9, SLB_ESID_V@h
>> +    std    r9, 0x20(r12)
>> +slb_entry_skip_2:
>> +    ld    r9, 0x30(r12)
>> +    /* Invalid? Skip. */
>> +    rldicl. r0, r9, 37, 63
>> +    beq    slb_entry_skip_3
>> +    xoris    r9, r9, SLB_ESID_V@h
>> +    std    r9, 0x30(r12)
>
> Can these 3 be made into a macro?

Phew - dynamically generating jump points sounds rather hard. I can  
give it a try...

>
>> +slb_entry_skip_3:
>> +
>> +#else
>> +#error unknown number of bolted entries
>> +#endif
>> +
>> +    /* Flush SLB */
>> +
>> +    slbia
>> +
>> +    /* r0 = esid & ESID_MASK */
>> +    rldicr  r10, r10, 0, 35
>> +    /* r0 |= CLASS_BIT(VSID) */
>> +    rldic   r12, r11, 56 - 36, 36
>> +    or      r10, r10, r12
>> +    slbie    r10
>> +
>> +    isync
>> +
>> +    /* Fill SLB with our shadow */
>> +
>> +    lbz    r12, PACA_KVM_SLB_MAX(r13)
>> +    mulli    r12, r12, 16
>> +    addi    r12, r12, PACA_KVM_SLB
>> +    add    r12, r12, r13
>> +
>> +    /* for (r11 = kvm_slb; r11 < kvm_slb + kvm_slb_size;  
>> r11+=slb_entry) */
>> +    li    r11, PACA_KVM_SLB
>> +    add    r11, r11, r13
>> +
>> +slb_loop_enter:
>> +
>> +    ld    r10, 0(r11)
>> +
>> +    rldicl. r0, r10, 37, 63
>> +    beq    slb_loop_enter_skip
>> +
>> +    ld    r9, 8(r11)
>> +    slbmte    r9, r10
>
> If you're updating the first 3 slbs, you need to make sure the slb
> shadow is updated at the same time

Well - what happens if we don't? We'd get a segment fault when phyp  
stole our entry! So what? Let it fault, see the mapping is already  
there and get back in again :-).

> (BTW dumb question: can we run this
> under PHYP?)

Yes, I tested it on bare metal, phyp and a PS3.


Alex

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 08/27] Add SLB switching code for entry/exit
@ 2009-11-02  9:23           ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-11-02  9:23 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson


Am 02.11.2009 um 00:23 schrieb Michael Neuling <mikey@neuling.org>:

>> This is the really low level of guest entry/exit code.
>>
>> Book3s_64 has an SLB, which stores all ESID -> VSID mappings we're
>> currently aware of.
>>
>> The segments in the guest differ from the ones on the host, so we  
>> need
>> to switch the SLB to tell the MMU that we're in a new context.
>>
>> So we store a shadow of the guest's SLB in the PACA, switch to that  
>> on
>> entry and only restore bolted entries on exit, leaving the rest to  
>> the
>> Linux SLB fault handler.
>>
>> That way we get a really clean way of switching the SLB.
>>
>> Signed-off-by: Alexander Graf <agraf@suse.de>
>> ---
>> arch/powerpc/kvm/book3s_64_slb.S |  277 ++++++++++++++++++++++++++++ 
>> ++++++++
> ++
>> 1 files changed, 277 insertions(+), 0 deletions(-)
>> create mode 100644 arch/powerpc/kvm/book3s_64_slb.S
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/ 
>> book3s_64_sl
> b.S
>> new file mode 100644
>> index 0000000..00a8367
>> --- /dev/null
>> +++ b/arch/powerpc/kvm/book3s_64_slb.S
>> @@ -0,0 +1,277 @@
>> +/*
>> + * This program is free software; you can redistribute it and/or  
>> modify
>> + * it under the terms of the GNU General Public License, version  
>> 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA   
>> 02110-1301, USA.
>> + *
>> + * Copyright SUSE Linux Products GmbH 2009
>> + *
>> + * Authors: Alexander Graf <agraf@suse.de>
>> + */
>> +
>> +/ 
>> *** 
>> *** 
>> *********************************************************************
> ***
>> + *
>  *
>> + *                               Entry code
>  *
>> + *
>  *
>> +  
>> *** 
>> *** 
>> *********************************************************************
> **/
>> +
>> +.global kvmppc_handler_trampoline_enter
>> +kvmppc_handler_trampoline_enter:
>> +
>> +    /* Required state:
>> +     *
>> +     * MSR = ~IR|DR
>> +     * R13 = PACA
>> +     * R9 = guest IP
>> +     * R10 = guest MSR
>> +     * R11 = free
>> +     * R12 = free
>> +     * PACA[PACA_EXMC + EX_R9] = guest R9
>> +     * PACA[PACA_EXMC + EX_R10] = guest R10
>> +     * PACA[PACA_EXMC + EX_R11] = guest R11
>> +     * PACA[PACA_EXMC + EX_R12] = guest R12
>> +     * PACA[PACA_EXMC + EX_R13] = guest R13
>> +     * PACA[PACA_EXMC + EX_CCR] = guest CR
>> +     * PACA[PACA_EXMC + EX_R3] = guest XER
>> +     */
>> +
>> +    mtsrr0    r9
>> +    mtsrr1    r10
>> +
>> +    mtspr    SPRN_SPRG_SCRATCH0, r0
>> +
>> +    /* Remove LPAR shadow entries */
>> +
>> +#if SLB_NUM_BOLTED == 3
>
> You could alternatively check the persistent entry in the slb_shawdow
> buffer.  This would give you a run time check.  Not sure what's best
> though.

Well we're in the hot path here, so anything using as few registers as  
possible and being simple is the best :-). I'd guess the more we are  
clever at compile time the better.

>
>
>> +
>> +    ld    r12, PACA_SLBSHADOWPTR(r13)
>> +    ld    r10, 0x10(r12)
>> +    ld    r11, 0x18(r12)
>
> Can you define something in asm-offsets.c for these magic constants  
> 0x10
> and 0x18.  Similarly below.
>
>> +    /* Invalid? Skip. */
>> +    rldicl. r0, r10, 37, 63
>> +    beq    slb_entry_skip_1
>> +    xoris    r9, r10, SLB_ESID_V@h
>> +    std    r9, 0x10(r12)
>> +slb_entry_skip_1:
>> +    ld    r9, 0x20(r12)
>> +    /* Invalid? Skip. */
>> +    rldicl. r0, r9, 37, 63
>> +    beq    slb_entry_skip_2
>> +    xoris    r9, r9, SLB_ESID_V@h
>> +    std    r9, 0x20(r12)
>> +slb_entry_skip_2:
>> +    ld    r9, 0x30(r12)
>> +    /* Invalid? Skip. */
>> +    rldicl. r0, r9, 37, 63
>> +    beq    slb_entry_skip_3
>> +    xoris    r9, r9, SLB_ESID_V@h
>> +    std    r9, 0x30(r12)
>
> Can these 3 be made into a macro?

Phew - dynamically generating jump points sounds rather hard. I can  
give it a try...

>
>> +slb_entry_skip_3:
>> +
>> +#else
>> +#error unknown number of bolted entries
>> +#endif
>> +
>> +    /* Flush SLB */
>> +
>> +    slbia
>> +
>> +    /* r0 = esid & ESID_MASK */
>> +    rldicr  r10, r10, 0, 35
>> +    /* r0 |= CLASS_BIT(VSID) */
>> +    rldic   r12, r11, 56 - 36, 36
>> +    or      r10, r10, r12
>> +    slbie    r10
>> +
>> +    isync
>> +
>> +    /* Fill SLB with our shadow */
>> +
>> +    lbz    r12, PACA_KVM_SLB_MAX(r13)
>> +    mulli    r12, r12, 16
>> +    addi    r12, r12, PACA_KVM_SLB
>> +    add    r12, r12, r13
>> +
>> +    /* for (r11 = kvm_slb; r11 < kvm_slb + kvm_slb_size;  
>> r11+=slb_entry) */
>> +    li    r11, PACA_KVM_SLB
>> +    add    r11, r11, r13
>> +
>> +slb_loop_enter:
>> +
>> +    ld    r10, 0(r11)
>> +
>> +    rldicl. r0, r10, 37, 63
>> +    beq    slb_loop_enter_skip
>> +
>> +    ld    r9, 8(r11)
>> +    slbmte    r9, r10
>
> If you're updating the first 3 slbs, you need to make sure the slb
> shadow is updated at the same time

Well - what happens if we don't? We'd get a segment fault when phyp  
stole our entry! So what? Let it fault, see the mapping is already  
there and get back in again :-).

> (BTW dumb question: can we run this
> under PHYP?)

Yes, I tested it on bare metal, phyp and a PS3.


Alex

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 08/27] Add SLB switching code for entry/exit
@ 2009-11-02  9:23           ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-11-02  9:23 UTC (permalink / raw)
  To: Michael Neuling
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Kevin Wolf, Arnd Bergmann,
	Hollis Blanchard, Marcelo Tosatti, kvm-ppc,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A, Avi Kivity,
	bphilips-l3A5Bk7waGM, Olof Johansson


Am 02.11.2009 um 00:23 schrieb Michael Neuling <mikey@neuling.org>:

>> This is the really low level of guest entry/exit code.
>>
>> Book3s_64 has an SLB, which stores all ESID -> VSID mappings we're
>> currently aware of.
>>
>> The segments in the guest differ from the ones on the host, so we  
>> need
>> to switch the SLB to tell the MMU that we're in a new context.
>>
>> So we store a shadow of the guest's SLB in the PACA, switch to that  
>> on
>> entry and only restore bolted entries on exit, leaving the rest to  
>> the
>> Linux SLB fault handler.
>>
>> That way we get a really clean way of switching the SLB.
>>
>> Signed-off-by: Alexander Graf <agraf@suse.de>
>> ---
>> arch/powerpc/kvm/book3s_64_slb.S |  277 ++++++++++++++++++++++++++++ 
>> ++++++++
> ++
>> 1 files changed, 277 insertions(+), 0 deletions(-)
>> create mode 100644 arch/powerpc/kvm/book3s_64_slb.S
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/ 
>> book3s_64_sl
> b.S
>> new file mode 100644
>> index 0000000..00a8367
>> --- /dev/null
>> +++ b/arch/powerpc/kvm/book3s_64_slb.S
>> @@ -0,0 +1,277 @@
>> +/*
>> + * This program is free software; you can redistribute it and/or  
>> modify
>> + * it under the terms of the GNU General Public License, version  
>> 2, as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA   
>> 02110-1301, USA.
>> + *
>> + * Copyright SUSE Linux Products GmbH 2009
>> + *
>> + * Authors: Alexander Graf <agraf@suse.de>
>> + */
>> +
>> +/ 
>> *** 
>> *** 
>> *********************************************************************
> ***
>> + *
>  *
>> + *                               Entry code
>  *
>> + *
>  *
>> +  
>> *** 
>> *** 
>> *********************************************************************
> **/
>> +
>> +.global kvmppc_handler_trampoline_enter
>> +kvmppc_handler_trampoline_enter:
>> +
>> +    /* Required state:
>> +     *
>> +     * MSR = ~IR|DR
>> +     * R13 = PACA
>> +     * R9 = guest IP
>> +     * R10 = guest MSR
>> +     * R11 = free
>> +     * R12 = free
>> +     * PACA[PACA_EXMC + EX_R9] = guest R9
>> +     * PACA[PACA_EXMC + EX_R10] = guest R10
>> +     * PACA[PACA_EXMC + EX_R11] = guest R11
>> +     * PACA[PACA_EXMC + EX_R12] = guest R12
>> +     * PACA[PACA_EXMC + EX_R13] = guest R13
>> +     * PACA[PACA_EXMC + EX_CCR] = guest CR
>> +     * PACA[PACA_EXMC + EX_R3] = guest XER
>> +     */
>> +
>> +    mtsrr0    r9
>> +    mtsrr1    r10
>> +
>> +    mtspr    SPRN_SPRG_SCRATCH0, r0
>> +
>> +    /* Remove LPAR shadow entries */
>> +
>> +#if SLB_NUM_BOLTED = 3
>
> You could alternatively check the persistent entry in the slb_shawdow
> buffer.  This would give you a run time check.  Not sure what's best
> though.

Well we're in the hot path here, so anything using as few registers as  
possible and being simple is the best :-). I'd guess the more we are  
clever at compile time the better.

>
>
>> +
>> +    ld    r12, PACA_SLBSHADOWPTR(r13)
>> +    ld    r10, 0x10(r12)
>> +    ld    r11, 0x18(r12)
>
> Can you define something in asm-offsets.c for these magic constants  
> 0x10
> and 0x18.  Similarly below.
>
>> +    /* Invalid? Skip. */
>> +    rldicl. r0, r10, 37, 63
>> +    beq    slb_entry_skip_1
>> +    xoris    r9, r10, SLB_ESID_V@h
>> +    std    r9, 0x10(r12)
>> +slb_entry_skip_1:
>> +    ld    r9, 0x20(r12)
>> +    /* Invalid? Skip. */
>> +    rldicl. r0, r9, 37, 63
>> +    beq    slb_entry_skip_2
>> +    xoris    r9, r9, SLB_ESID_V@h
>> +    std    r9, 0x20(r12)
>> +slb_entry_skip_2:
>> +    ld    r9, 0x30(r12)
>> +    /* Invalid? Skip. */
>> +    rldicl. r0, r9, 37, 63
>> +    beq    slb_entry_skip_3
>> +    xoris    r9, r9, SLB_ESID_V@h
>> +    std    r9, 0x30(r12)
>
> Can these 3 be made into a macro?

Phew - dynamically generating jump points sounds rather hard. I can  
give it a try...

>
>> +slb_entry_skip_3:
>> +
>> +#else
>> +#error unknown number of bolted entries
>> +#endif
>> +
>> +    /* Flush SLB */
>> +
>> +    slbia
>> +
>> +    /* r0 = esid & ESID_MASK */
>> +    rldicr  r10, r10, 0, 35
>> +    /* r0 |= CLASS_BIT(VSID) */
>> +    rldic   r12, r11, 56 - 36, 36
>> +    or      r10, r10, r12
>> +    slbie    r10
>> +
>> +    isync
>> +
>> +    /* Fill SLB with our shadow */
>> +
>> +    lbz    r12, PACA_KVM_SLB_MAX(r13)
>> +    mulli    r12, r12, 16
>> +    addi    r12, r12, PACA_KVM_SLB
>> +    add    r12, r12, r13
>> +
>> +    /* for (r11 = kvm_slb; r11 < kvm_slb + kvm_slb_size;  
>> r11+=slb_entry) */
>> +    li    r11, PACA_KVM_SLB
>> +    add    r11, r11, r13
>> +
>> +slb_loop_enter:
>> +
>> +    ld    r10, 0(r11)
>> +
>> +    rldicl. r0, r10, 37, 63
>> +    beq    slb_loop_enter_skip
>> +
>> +    ld    r9, 8(r11)
>> +    slbmte    r9, r10
>
> If you're updating the first 3 slbs, you need to make sure the slb
> shadow is updated at the same time

Well - what happens if we don't? We'd get a segment fault when phyp  
stole our entry! So what? Let it fault, see the mapping is already  
there and get back in again :-).

> (BTW dumb question: can we run this
> under PHYP?)

Yes, I tested it on bare metal, phyp and a PS3.


Alex

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 11/27] Add book3s_64 Host MMU handling
  2009-11-01 23:39       ` Michael Neuling
  (?)
@ 2009-11-02  9:26         ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-11-02  9:26 UTC (permalink / raw)
  To: Michael Neuling
  Cc: kvm, Kevin Wolf, Arnd Bergmann, Hollis Blanchard,
	Marcelo Tosatti, kvm-ppc, linuxppc-dev, Avi Kivity, bphilips,
	Olof Johansson


Am 02.11.2009 um 00:39 schrieb Michael Neuling <mikey@neuling.org>:

> <snip>
>> +static void invalidate_pte(struct hpte_cache *pte)
>> +{
>> +    dprintk_mmu("KVM: Flushing SPT %d: 0x%llx (0x%llx) -> 0x%llx\n",
>> +            i, pte->pte.eaddr, pte->pte.vpage, pte->host_va);
>> +
>> +    ppc_md.hpte_invalidate(pte->slot, pte->host_va,
>> +                   MMU_PAGE_4K, MMU_SEGSIZE_256M,
>> +                   false);
>
> Are we assuming 256M segments here (and elsewhere)?

Yes, on the host we only create 256MB segments. What the guest uses is  
a different question.

>
> <snip>
>> +static int kvmppc_mmu_next_segment(struct kvm_vcpu *vcpu, ulong  
>> esid)
>> +{
>> +    int i;
>> +    int max_slb_size = 64;
>> +    int found_inval = -1;
>> +    int r;
>> +
>> +    if (!get_paca()->kvm_slb_max)
>> +        get_paca()->kvm_slb_max = 1;
>> +
>> +    /* Are we overwriting? */
>> +    for (i = 1; i < get_paca()->kvm_slb_max; i++) {
>> +        if (!(get_paca()->kvm_slb[i].esid & SLB_ESID_V))
>> +            found_inval = i;
>> +        else if ((get_paca()->kvm_slb[i].esid & ESID_MASK) == esid)
>> +            return i;
>> +    }
>> +
>> +    /* Found a spare entry that was invalidated before */
>> +    if (found_inval > 0)
>> +        return found_inval;
>> +
>> +    /* No spare invalid entry, so create one */
>> +
>> +    if (mmu_slb_size < 64)
>> +        max_slb_size = mmu_slb_size;
>
> Can we just use the global mmu_slb_size eliminate max_slb_size?

Hm, for a strange reason I wanted to have at most 64 slb entries.  
Maybe the struct can't hold more? I'll check again as soon as I'm on a  
notebook again.

Alex

>
> <snip>
>
> Mikey
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 11/27] Add book3s_64 Host MMU handling
@ 2009-11-02  9:26         ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-11-02  9:26 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson


Am 02.11.2009 um 00:39 schrieb Michael Neuling <mikey@neuling.org>:

> <snip>
>> +static void invalidate_pte(struct hpte_cache *pte)
>> +{
>> +    dprintk_mmu("KVM: Flushing SPT %d: 0x%llx (0x%llx) -> 0x%llx\n",
>> +            i, pte->pte.eaddr, pte->pte.vpage, pte->host_va);
>> +
>> +    ppc_md.hpte_invalidate(pte->slot, pte->host_va,
>> +                   MMU_PAGE_4K, MMU_SEGSIZE_256M,
>> +                   false);
>
> Are we assuming 256M segments here (and elsewhere)?

Yes, on the host we only create 256MB segments. What the guest uses is  
a different question.

>
> <snip>
>> +static int kvmppc_mmu_next_segment(struct kvm_vcpu *vcpu, ulong  
>> esid)
>> +{
>> +    int i;
>> +    int max_slb_size = 64;
>> +    int found_inval = -1;
>> +    int r;
>> +
>> +    if (!get_paca()->kvm_slb_max)
>> +        get_paca()->kvm_slb_max = 1;
>> +
>> +    /* Are we overwriting? */
>> +    for (i = 1; i < get_paca()->kvm_slb_max; i++) {
>> +        if (!(get_paca()->kvm_slb[i].esid & SLB_ESID_V))
>> +            found_inval = i;
>> +        else if ((get_paca()->kvm_slb[i].esid & ESID_MASK) == esid)
>> +            return i;
>> +    }
>> +
>> +    /* Found a spare entry that was invalidated before */
>> +    if (found_inval > 0)
>> +        return found_inval;
>> +
>> +    /* No spare invalid entry, so create one */
>> +
>> +    if (mmu_slb_size < 64)
>> +        max_slb_size = mmu_slb_size;
>
> Can we just use the global mmu_slb_size eliminate max_slb_size?

Hm, for a strange reason I wanted to have at most 64 slb entries.  
Maybe the struct can't hold more? I'll check again as soon as I'm on a  
notebook again.

Alex

>
> <snip>
>
> Mikey
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 11/27] Add book3s_64 Host MMU handling
@ 2009-11-02  9:26         ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-11-02  9:26 UTC (permalink / raw)
  To: Michael Neuling
  Cc: kvm, Kevin Wolf, Arnd Bergmann, Hollis Blanchard,
	Marcelo Tosatti, kvm-ppc, linuxppc-dev, Avi Kivity, bphilips,
	Olof Johansson


Am 02.11.2009 um 00:39 schrieb Michael Neuling <mikey@neuling.org>:

> <snip>
>> +static void invalidate_pte(struct hpte_cache *pte)
>> +{
>> +    dprintk_mmu("KVM: Flushing SPT %d: 0x%llx (0x%llx) -> 0x%llx\n",
>> +            i, pte->pte.eaddr, pte->pte.vpage, pte->host_va);
>> +
>> +    ppc_md.hpte_invalidate(pte->slot, pte->host_va,
>> +                   MMU_PAGE_4K, MMU_SEGSIZE_256M,
>> +                   false);
>
> Are we assuming 256M segments here (and elsewhere)?

Yes, on the host we only create 256MB segments. What the guest uses is  
a different question.

>
> <snip>
>> +static int kvmppc_mmu_next_segment(struct kvm_vcpu *vcpu, ulong  
>> esid)
>> +{
>> +    int i;
>> +    int max_slb_size = 64;
>> +    int found_inval = -1;
>> +    int r;
>> +
>> +    if (!get_paca()->kvm_slb_max)
>> +        get_paca()->kvm_slb_max = 1;
>> +
>> +    /* Are we overwriting? */
>> +    for (i = 1; i < get_paca()->kvm_slb_max; i++) {
>> +        if (!(get_paca()->kvm_slb[i].esid & SLB_ESID_V))
>> +            found_inval = i;
>> +        else if ((get_paca()->kvm_slb[i].esid & ESID_MASK) = esid)
>> +            return i;
>> +    }
>> +
>> +    /* Found a spare entry that was invalidated before */
>> +    if (found_inval > 0)
>> +        return found_inval;
>> +
>> +    /* No spare invalid entry, so create one */
>> +
>> +    if (mmu_slb_size < 64)
>> +        max_slb_size = mmu_slb_size;
>
> Can we just use the global mmu_slb_size eliminate max_slb_size?

Hm, for a strange reason I wanted to have at most 64 slb entries.  
Maybe the struct can't hold more? I'll check again as soon as I'm on a  
notebook again.

Alex

>
> <snip>
>
> Mikey
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 08/27] Add SLB switching code for entry/exit
  2009-11-02  9:23           ` Alexander Graf
  (?)
@ 2009-11-02  9:39               ` Michael Neuling
  -1 siblings, 0 replies; 244+ messages in thread
From: Michael Neuling @ 2009-11-02  9:39 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Kevin Wolf, Arnd Bergmann,
	Hollis Blanchard, Marcelo Tosatti, kvm-ppc,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A, Avi Kivity,
	bphilips-l3A5Bk7waGM, Olof Johansson

> >> This is the really low level of guest entry/exit code.
> >>
> >> Book3s_64 has an SLB, which stores all ESID -> VSID mappings we're
> >> currently aware of.
> >>
> >> The segments in the guest differ from the ones on the host, so we  
> >> need
> >> to switch the SLB to tell the MMU that we're in a new context.
> >>
> >> So we store a shadow of the guest's SLB in the PACA, switch to that  
> >> on
> >> entry and only restore bolted entries on exit, leaving the rest to  
> >> the
> >> Linux SLB fault handler.
> >>
> >> That way we get a really clean way of switching the SLB.
> >>
> >> Signed-off-by: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
> >> ---
> >> arch/powerpc/kvm/book3s_64_slb.S |  277 ++++++++++++++++++++++++++++ 
> >> ++++++++
> > ++
> >> 1 files changed, 277 insertions(+), 0 deletions(-)
> >> create mode 100644 arch/powerpc/kvm/book3s_64_slb.S
> >>
> >> diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/ 
> >> book3s_64_sl
> > b.S
> >> new file mode 100644
> >> index 0000000..00a8367
> >> --- /dev/null
> >> +++ b/arch/powerpc/kvm/book3s_64_slb.S
> >> @@ -0,0 +1,277 @@
> >> +/*
> >> + * This program is free software; you can redistribute it and/or  
> >> modify
> >> + * it under the terms of the GNU General Public License, version  
> >> 2, as
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program; if not, write to the Free Software
> >> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA   
> >> 02110-1301, USA.
> >> + *
> >> + * Copyright SUSE Linux Products GmbH 2009
> >> + *
> >> + * Authors: Alexander Graf <agraf-l3A5Bk7waGM@public.gmane.org>
> >> + */
> >> +
> >> +/ 
> >> *** 
> >> *** 
> >> *********************************************************************
> > ***
> >> + *
> >  *
> >> + *                               Entry code
> >  *
> >> + *
> >  *
> >> +  
> >> *** 
> >> *** 
> >> *********************************************************************
> > **/
> >> +
> >> +.global kvmppc_handler_trampoline_enter
> >> +kvmppc_handler_trampoline_enter:
> >> +
> >> +    /* Required state:
> >> +     *
> >> +     * MSR = ~IR|DR
> >> +     * R13 = PACA
> >> +     * R9 = guest IP
> >> +     * R10 = guest MSR
> >> +     * R11 = free
> >> +     * R12 = free
> >> +     * PACA[PACA_EXMC + EX_R9] = guest R9
> >> +     * PACA[PACA_EXMC + EX_R10] = guest R10
> >> +     * PACA[PACA_EXMC + EX_R11] = guest R11
> >> +     * PACA[PACA_EXMC + EX_R12] = guest R12
> >> +     * PACA[PACA_EXMC + EX_R13] = guest R13
> >> +     * PACA[PACA_EXMC + EX_CCR] = guest CR
> >> +     * PACA[PACA_EXMC + EX_R3] = guest XER
> >> +     */
> >> +
> >> +    mtsrr0    r9
> >> +    mtsrr1    r10
> >> +
> >> +    mtspr    SPRN_SPRG_SCRATCH0, r0
> >> +
> >> +    /* Remove LPAR shadow entries */
> >> +
> >> +#if SLB_NUM_BOLTED == 3
> >
> > You could alternatively check the persistent entry in the slb_shawdow
> > buffer.  This would give you a run time check.  Not sure what's best
> > though.
> 
> Well we're in the hot path here, so anything using as few registers as  
> possible and being simple is the best :-). I'd guess the more we are  
> clever at compile time the better.

Yeah, I tend to agree.

> 
> >
> >
> >> +
> >> +    ld    r12, PACA_SLBSHADOWPTR(r13)
> >> +    ld    r10, 0x10(r12)
> >> +    ld    r11, 0x18(r12)
> >
> > Can you define something in asm-offsets.c for these magic constants  
> > 0x10
> > and 0x18.  Similarly below.
> >
> >> +    /* Invalid? Skip. */
> >> +    rldicl. r0, r10, 37, 63
> >> +    beq    slb_entry_skip_1
> >> +    xoris    r9, r10, SLB_ESID_V@h
> >> +    std    r9, 0x10(r12)
> >> +slb_entry_skip_1:
> >> +    ld    r9, 0x20(r12)
> >> +    /* Invalid? Skip. */
> >> +    rldicl. r0, r9, 37, 63
> >> +    beq    slb_entry_skip_2
> >> +    xoris    r9, r9, SLB_ESID_V@h
> >> +    std    r9, 0x20(r12)
> >> +slb_entry_skip_2:
> >> +    ld    r9, 0x30(r12)
> >> +    /* Invalid? Skip. */
> >> +    rldicl. r0, r9, 37, 63
> >> +    beq    slb_entry_skip_3
> >> +    xoris    r9, r9, SLB_ESID_V@h
> >> +    std    r9, 0x30(r12)
> >
> > Can these 3 be made into a macro?
> 
> Phew - dynamically generating jump points sounds rather hard. I can  
> give it a try...
> 
> >
> >> +slb_entry_skip_3:
> >> +
> >> +#else
> >> +#error unknown number of bolted entries
> >> +#endif
> >> +
> >> +    /* Flush SLB */
> >> +
> >> +    slbia
> >> +
> >> +    /* r0 = esid & ESID_MASK */
> >> +    rldicr  r10, r10, 0, 35
> >> +    /* r0 |= CLASS_BIT(VSID) */
> >> +    rldic   r12, r11, 56 - 36, 36
> >> +    or      r10, r10, r12
> >> +    slbie    r10
> >> +
> >> +    isync
> >> +
> >> +    /* Fill SLB with our shadow */
> >> +
> >> +    lbz    r12, PACA_KVM_SLB_MAX(r13)
> >> +    mulli    r12, r12, 16
> >> +    addi    r12, r12, PACA_KVM_SLB
> >> +    add    r12, r12, r13
> >> +
> >> +    /* for (r11 = kvm_slb; r11 < kvm_slb + kvm_slb_size;  
> >> r11+=slb_entry) */
> >> +    li    r11, PACA_KVM_SLB
> >> +    add    r11, r11, r13
> >> +
> >> +slb_loop_enter:
> >> +
> >> +    ld    r10, 0(r11)
> >> +
> >> +    rldicl. r0, r10, 37, 63
> >> +    beq    slb_loop_enter_skip
> >> +
> >> +    ld    r9, 8(r11)
> >> +    slbmte    r9, r10
> >
> > If you're updating the first 3 slbs, you need to make sure the slb
> > shadow is updated at the same time
> 
> Well - what happens if we don't? We'd get a segment fault when phyp  
> stole our entry! So what? Let it fault, see the mapping is already  
> there and get back in again :-).

The problem is you won't take the segment fault as PHYP may put a valid
entry in there.  PHYP will put back what's in the shadow buffer, which
could be valid hence no segment fault.

> > (BTW dumb question: can we run this
> > under PHYP?)
> 
> Yes, I tested it on bare metal, phyp and a PS3.

Nice!

Mikey

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 08/27] Add SLB switching code for entry/exit
@ 2009-11-02  9:39               ` Michael Neuling
  0 siblings, 0 replies; 244+ messages in thread
From: Michael Neuling @ 2009-11-02  9:39 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson

> >> This is the really low level of guest entry/exit code.
> >>
> >> Book3s_64 has an SLB, which stores all ESID -> VSID mappings we're
> >> currently aware of.
> >>
> >> The segments in the guest differ from the ones on the host, so we  
> >> need
> >> to switch the SLB to tell the MMU that we're in a new context.
> >>
> >> So we store a shadow of the guest's SLB in the PACA, switch to that  
> >> on
> >> entry and only restore bolted entries on exit, leaving the rest to  
> >> the
> >> Linux SLB fault handler.
> >>
> >> That way we get a really clean way of switching the SLB.
> >>
> >> Signed-off-by: Alexander Graf <agraf@suse.de>
> >> ---
> >> arch/powerpc/kvm/book3s_64_slb.S |  277 ++++++++++++++++++++++++++++ 
> >> ++++++++
> > ++
> >> 1 files changed, 277 insertions(+), 0 deletions(-)
> >> create mode 100644 arch/powerpc/kvm/book3s_64_slb.S
> >>
> >> diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/ 
> >> book3s_64_sl
> > b.S
> >> new file mode 100644
> >> index 0000000..00a8367
> >> --- /dev/null
> >> +++ b/arch/powerpc/kvm/book3s_64_slb.S
> >> @@ -0,0 +1,277 @@
> >> +/*
> >> + * This program is free software; you can redistribute it and/or  
> >> modify
> >> + * it under the terms of the GNU General Public License, version  
> >> 2, as
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program; if not, write to the Free Software
> >> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA   
> >> 02110-1301, USA.
> >> + *
> >> + * Copyright SUSE Linux Products GmbH 2009
> >> + *
> >> + * Authors: Alexander Graf <agraf@suse.de>
> >> + */
> >> +
> >> +/ 
> >> *** 
> >> *** 
> >> *********************************************************************
> > ***
> >> + *
> >  *
> >> + *                               Entry code
> >  *
> >> + *
> >  *
> >> +  
> >> *** 
> >> *** 
> >> *********************************************************************
> > **/
> >> +
> >> +.global kvmppc_handler_trampoline_enter
> >> +kvmppc_handler_trampoline_enter:
> >> +
> >> +    /* Required state:
> >> +     *
> >> +     * MSR = ~IR|DR
> >> +     * R13 = PACA
> >> +     * R9 = guest IP
> >> +     * R10 = guest MSR
> >> +     * R11 = free
> >> +     * R12 = free
> >> +     * PACA[PACA_EXMC + EX_R9] = guest R9
> >> +     * PACA[PACA_EXMC + EX_R10] = guest R10
> >> +     * PACA[PACA_EXMC + EX_R11] = guest R11
> >> +     * PACA[PACA_EXMC + EX_R12] = guest R12
> >> +     * PACA[PACA_EXMC + EX_R13] = guest R13
> >> +     * PACA[PACA_EXMC + EX_CCR] = guest CR
> >> +     * PACA[PACA_EXMC + EX_R3] = guest XER
> >> +     */
> >> +
> >> +    mtsrr0    r9
> >> +    mtsrr1    r10
> >> +
> >> +    mtspr    SPRN_SPRG_SCRATCH0, r0
> >> +
> >> +    /* Remove LPAR shadow entries */
> >> +
> >> +#if SLB_NUM_BOLTED == 3
> >
> > You could alternatively check the persistent entry in the slb_shawdow
> > buffer.  This would give you a run time check.  Not sure what's best
> > though.
> 
> Well we're in the hot path here, so anything using as few registers as  
> possible and being simple is the best :-). I'd guess the more we are  
> clever at compile time the better.

Yeah, I tend to agree.

> 
> >
> >
> >> +
> >> +    ld    r12, PACA_SLBSHADOWPTR(r13)
> >> +    ld    r10, 0x10(r12)
> >> +    ld    r11, 0x18(r12)
> >
> > Can you define something in asm-offsets.c for these magic constants  
> > 0x10
> > and 0x18.  Similarly below.
> >
> >> +    /* Invalid? Skip. */
> >> +    rldicl. r0, r10, 37, 63
> >> +    beq    slb_entry_skip_1
> >> +    xoris    r9, r10, SLB_ESID_V@h
> >> +    std    r9, 0x10(r12)
> >> +slb_entry_skip_1:
> >> +    ld    r9, 0x20(r12)
> >> +    /* Invalid? Skip. */
> >> +    rldicl. r0, r9, 37, 63
> >> +    beq    slb_entry_skip_2
> >> +    xoris    r9, r9, SLB_ESID_V@h
> >> +    std    r9, 0x20(r12)
> >> +slb_entry_skip_2:
> >> +    ld    r9, 0x30(r12)
> >> +    /* Invalid? Skip. */
> >> +    rldicl. r0, r9, 37, 63
> >> +    beq    slb_entry_skip_3
> >> +    xoris    r9, r9, SLB_ESID_V@h
> >> +    std    r9, 0x30(r12)
> >
> > Can these 3 be made into a macro?
> 
> Phew - dynamically generating jump points sounds rather hard. I can  
> give it a try...
> 
> >
> >> +slb_entry_skip_3:
> >> +
> >> +#else
> >> +#error unknown number of bolted entries
> >> +#endif
> >> +
> >> +    /* Flush SLB */
> >> +
> >> +    slbia
> >> +
> >> +    /* r0 = esid & ESID_MASK */
> >> +    rldicr  r10, r10, 0, 35
> >> +    /* r0 |= CLASS_BIT(VSID) */
> >> +    rldic   r12, r11, 56 - 36, 36
> >> +    or      r10, r10, r12
> >> +    slbie    r10
> >> +
> >> +    isync
> >> +
> >> +    /* Fill SLB with our shadow */
> >> +
> >> +    lbz    r12, PACA_KVM_SLB_MAX(r13)
> >> +    mulli    r12, r12, 16
> >> +    addi    r12, r12, PACA_KVM_SLB
> >> +    add    r12, r12, r13
> >> +
> >> +    /* for (r11 = kvm_slb; r11 < kvm_slb + kvm_slb_size;  
> >> r11+=slb_entry) */
> >> +    li    r11, PACA_KVM_SLB
> >> +    add    r11, r11, r13
> >> +
> >> +slb_loop_enter:
> >> +
> >> +    ld    r10, 0(r11)
> >> +
> >> +    rldicl. r0, r10, 37, 63
> >> +    beq    slb_loop_enter_skip
> >> +
> >> +    ld    r9, 8(r11)
> >> +    slbmte    r9, r10
> >
> > If you're updating the first 3 slbs, you need to make sure the slb
> > shadow is updated at the same time
> 
> Well - what happens if we don't? We'd get a segment fault when phyp  
> stole our entry! So what? Let it fault, see the mapping is already  
> there and get back in again :-).

The problem is you won't take the segment fault as PHYP may put a valid
entry in there.  PHYP will put back what's in the shadow buffer, which
could be valid hence no segment fault.

> > (BTW dumb question: can we run this
> > under PHYP?)
> 
> Yes, I tested it on bare metal, phyp and a PS3.

Nice!

Mikey

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 08/27] Add SLB switching code for entry/exit
@ 2009-11-02  9:39               ` Michael Neuling
  0 siblings, 0 replies; 244+ messages in thread
From: Michael Neuling @ 2009-11-02  9:39 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Kevin Wolf, Arnd Bergmann,
	Hollis Blanchard, Marcelo Tosatti, kvm-ppc,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A, Avi Kivity,
	bphilips-l3A5Bk7waGM, Olof Johansson

> >> This is the really low level of guest entry/exit code.
> >>
> >> Book3s_64 has an SLB, which stores all ESID -> VSID mappings we're
> >> currently aware of.
> >>
> >> The segments in the guest differ from the ones on the host, so we  
> >> need
> >> to switch the SLB to tell the MMU that we're in a new context.
> >>
> >> So we store a shadow of the guest's SLB in the PACA, switch to that  
> >> on
> >> entry and only restore bolted entries on exit, leaving the rest to  
> >> the
> >> Linux SLB fault handler.
> >>
> >> That way we get a really clean way of switching the SLB.
> >>
> >> Signed-off-by: Alexander Graf <agraf@suse.de>
> >> ---
> >> arch/powerpc/kvm/book3s_64_slb.S |  277 ++++++++++++++++++++++++++++ 
> >> ++++++++
> > ++
> >> 1 files changed, 277 insertions(+), 0 deletions(-)
> >> create mode 100644 arch/powerpc/kvm/book3s_64_slb.S
> >>
> >> diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/ 
> >> book3s_64_sl
> > b.S
> >> new file mode 100644
> >> index 0000000..00a8367
> >> --- /dev/null
> >> +++ b/arch/powerpc/kvm/book3s_64_slb.S
> >> @@ -0,0 +1,277 @@
> >> +/*
> >> + * This program is free software; you can redistribute it and/or  
> >> modify
> >> + * it under the terms of the GNU General Public License, version  
> >> 2, as
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program; if not, write to the Free Software
> >> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA   
> >> 02110-1301, USA.
> >> + *
> >> + * Copyright SUSE Linux Products GmbH 2009
> >> + *
> >> + * Authors: Alexander Graf <agraf@suse.de>
> >> + */
> >> +
> >> +/ 
> >> *** 
> >> *** 
> >> *********************************************************************
> > ***
> >> + *
> >  *
> >> + *                               Entry code
> >  *
> >> + *
> >  *
> >> +  
> >> *** 
> >> *** 
> >> *********************************************************************
> > **/
> >> +
> >> +.global kvmppc_handler_trampoline_enter
> >> +kvmppc_handler_trampoline_enter:
> >> +
> >> +    /* Required state:
> >> +     *
> >> +     * MSR = ~IR|DR
> >> +     * R13 = PACA
> >> +     * R9 = guest IP
> >> +     * R10 = guest MSR
> >> +     * R11 = free
> >> +     * R12 = free
> >> +     * PACA[PACA_EXMC + EX_R9] = guest R9
> >> +     * PACA[PACA_EXMC + EX_R10] = guest R10
> >> +     * PACA[PACA_EXMC + EX_R11] = guest R11
> >> +     * PACA[PACA_EXMC + EX_R12] = guest R12
> >> +     * PACA[PACA_EXMC + EX_R13] = guest R13
> >> +     * PACA[PACA_EXMC + EX_CCR] = guest CR
> >> +     * PACA[PACA_EXMC + EX_R3] = guest XER
> >> +     */
> >> +
> >> +    mtsrr0    r9
> >> +    mtsrr1    r10
> >> +
> >> +    mtspr    SPRN_SPRG_SCRATCH0, r0
> >> +
> >> +    /* Remove LPAR shadow entries */
> >> +
> >> +#if SLB_NUM_BOLTED = 3
> >
> > You could alternatively check the persistent entry in the slb_shawdow
> > buffer.  This would give you a run time check.  Not sure what's best
> > though.
> 
> Well we're in the hot path here, so anything using as few registers as  
> possible and being simple is the best :-). I'd guess the more we are  
> clever at compile time the better.

Yeah, I tend to agree.

> 
> >
> >
> >> +
> >> +    ld    r12, PACA_SLBSHADOWPTR(r13)
> >> +    ld    r10, 0x10(r12)
> >> +    ld    r11, 0x18(r12)
> >
> > Can you define something in asm-offsets.c for these magic constants  
> > 0x10
> > and 0x18.  Similarly below.
> >
> >> +    /* Invalid? Skip. */
> >> +    rldicl. r0, r10, 37, 63
> >> +    beq    slb_entry_skip_1
> >> +    xoris    r9, r10, SLB_ESID_V@h
> >> +    std    r9, 0x10(r12)
> >> +slb_entry_skip_1:
> >> +    ld    r9, 0x20(r12)
> >> +    /* Invalid? Skip. */
> >> +    rldicl. r0, r9, 37, 63
> >> +    beq    slb_entry_skip_2
> >> +    xoris    r9, r9, SLB_ESID_V@h
> >> +    std    r9, 0x20(r12)
> >> +slb_entry_skip_2:
> >> +    ld    r9, 0x30(r12)
> >> +    /* Invalid? Skip. */
> >> +    rldicl. r0, r9, 37, 63
> >> +    beq    slb_entry_skip_3
> >> +    xoris    r9, r9, SLB_ESID_V@h
> >> +    std    r9, 0x30(r12)
> >
> > Can these 3 be made into a macro?
> 
> Phew - dynamically generating jump points sounds rather hard. I can  
> give it a try...
> 
> >
> >> +slb_entry_skip_3:
> >> +
> >> +#else
> >> +#error unknown number of bolted entries
> >> +#endif
> >> +
> >> +    /* Flush SLB */
> >> +
> >> +    slbia
> >> +
> >> +    /* r0 = esid & ESID_MASK */
> >> +    rldicr  r10, r10, 0, 35
> >> +    /* r0 |= CLASS_BIT(VSID) */
> >> +    rldic   r12, r11, 56 - 36, 36
> >> +    or      r10, r10, r12
> >> +    slbie    r10
> >> +
> >> +    isync
> >> +
> >> +    /* Fill SLB with our shadow */
> >> +
> >> +    lbz    r12, PACA_KVM_SLB_MAX(r13)
> >> +    mulli    r12, r12, 16
> >> +    addi    r12, r12, PACA_KVM_SLB
> >> +    add    r12, r12, r13
> >> +
> >> +    /* for (r11 = kvm_slb; r11 < kvm_slb + kvm_slb_size;  
> >> r11+=slb_entry) */
> >> +    li    r11, PACA_KVM_SLB
> >> +    add    r11, r11, r13
> >> +
> >> +slb_loop_enter:
> >> +
> >> +    ld    r10, 0(r11)
> >> +
> >> +    rldicl. r0, r10, 37, 63
> >> +    beq    slb_loop_enter_skip
> >> +
> >> +    ld    r9, 8(r11)
> >> +    slbmte    r9, r10
> >
> > If you're updating the first 3 slbs, you need to make sure the slb
> > shadow is updated at the same time
> 
> Well - what happens if we don't? We'd get a segment fault when phyp  
> stole our entry! So what? Let it fault, see the mapping is already  
> there and get back in again :-).

The problem is you won't take the segment fault as PHYP may put a valid
entry in there.  PHYP will put back what's in the shadow buffer, which
could be valid hence no segment fault.

> > (BTW dumb question: can we run this
> > under PHYP?)
> 
> Yes, I tested it on bare metal, phyp and a PS3.

Nice!

Mikey

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 08/27] Add SLB switching code for entry/exit
  2009-11-02  9:39               ` Michael Neuling
  (?)
@ 2009-11-02  9:59                 ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-11-02  9:59 UTC (permalink / raw)
  To: Michael Neuling
  Cc: kvm, Kevin Wolf, Arnd Bergmann, Hollis Blanchard,
	Marcelo Tosatti, kvm-ppc, linuxppc-dev, Avi Kivity, bphilips,
	Olof Johansson


Am 02.11.2009 um 10:39 schrieb Michael Neuling <mikey@neuling.org>:

>>>> This is the really low level of guest entry/exit code.
>>>>
>>>> Book3s_64 has an SLB, which stores all ESID -> VSID mappings we're
>>>> currently aware of.
>>>>
>>>> The segments in the guest differ from the ones on the host, so we
>>>> need
>>>> to switch the SLB to tell the MMU that we're in a new context.
>>>>
>>>> So we store a shadow of the guest's SLB in the PACA, switch to that
>>>> on
>>>> entry and only restore bolted entries on exit, leaving the rest to
>>>> the
>>>> Linux SLB fault handler.
>>>>
>>>> That way we get a really clean way of switching the SLB.
>>>>
>>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>>> ---
>>>> arch/powerpc/kvm/book3s_64_slb.S |  277 ++++++++++++++++++++++++++ 
>>>> ++
>>>> ++++++++
>>> ++
>>>> 1 files changed, 277 insertions(+), 0 deletions(-)
>>>> create mode 100644 arch/powerpc/kvm/book3s_64_slb.S
>>>>
>>>> diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/
>>>> book3s_64_sl
>>> b.S
>>>> new file mode 100644
>>>> index 0000000..00a8367
>>>> --- /dev/null
>>>> +++ b/arch/powerpc/kvm/book3s_64_slb.S
>>>> @@ -0,0 +1,277 @@
>>>> +/*
>>>> + * This program is free software; you can redistribute it and/or
>>>> modify
>>>> + * it under the terms of the GNU General Public License, version
>>>> 2, as
>>>> + * published by the Free Software Foundation.
>>>> + *
>>>> + * This program is distributed in the hope that it will be useful,
>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>> + * GNU General Public License for more details.
>>>> + *
>>>> + * You should have received a copy of the GNU General Public  
>>>> License
>>>> + * along with this program; if not, write to the Free Software
>>>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
>>>> 02110-1301, USA.
>>>> + *
>>>> + * Copyright SUSE Linux Products GmbH 2009
>>>> + *
>>>> + * Authors: Alexander Graf <agraf@suse.de>
>>>> + */
>>>> +
>>>> +/
>>>> ***
>>>> ***
>>>> *** 
>>>> ******************************************************************
>>> ***
>>>> + *
>>> *
>>>> + *                               Entry code
>>> *
>>>> + *
>>> *
>>>> +
>>>> ***
>>>> ***
>>>> *** 
>>>> ******************************************************************
>>> **/
>>>> +
>>>> +.global kvmppc_handler_trampoline_enter
>>>> +kvmppc_handler_trampoline_enter:
>>>> +
>>>> +    /* Required state:
>>>> +     *
>>>> +     * MSR = ~IR|DR
>>>> +     * R13 = PACA
>>>> +     * R9 = guest IP
>>>> +     * R10 = guest MSR
>>>> +     * R11 = free
>>>> +     * R12 = free
>>>> +     * PACA[PACA_EXMC + EX_R9] = guest R9
>>>> +     * PACA[PACA_EXMC + EX_R10] = guest R10
>>>> +     * PACA[PACA_EXMC + EX_R11] = guest R11
>>>> +     * PACA[PACA_EXMC + EX_R12] = guest R12
>>>> +     * PACA[PACA_EXMC + EX_R13] = guest R13
>>>> +     * PACA[PACA_EXMC + EX_CCR] = guest CR
>>>> +     * PACA[PACA_EXMC + EX_R3] = guest XER
>>>> +     */
>>>> +
>>>> +    mtsrr0    r9
>>>> +    mtsrr1    r10
>>>> +
>>>> +    mtspr    SPRN_SPRG_SCRATCH0, r0
>>>> +
>>>> +    /* Remove LPAR shadow entries */
>>>> +
>>>> +#if SLB_NUM_BOLTED == 3
>>>
>>> You could alternatively check the persistent entry in the  
>>> slb_shawdow
>>> buffer.  This would give you a run time check.  Not sure what's best
>>> though.
>>
>> Well we're in the hot path here, so anything using as few registers  
>> as
>> possible and being simple is the best :-). I'd guess the more we are
>> clever at compile time the better.
>
> Yeah, I tend to agree.
>
>>
>>>
>>>
>>>> +
>>>> +    ld    r12, PACA_SLBSHADOWPTR(r13)
>>>> +    ld    r10, 0x10(r12)
>>>> +    ld    r11, 0x18(r12)
>>>
>>> Can you define something in asm-offsets.c for these magic constants
>>> 0x10
>>> and 0x18.  Similarly below.
>>>
>>>> +    /* Invalid? Skip. */
>>>> +    rldicl. r0, r10, 37, 63
>>>> +    beq    slb_entry_skip_1
>>>> +    xoris    r9, r10, SLB_ESID_V@h
>>>> +    std    r9, 0x10(r12)
>>>> +slb_entry_skip_1:
>>>> +    ld    r9, 0x20(r12)
>>>> +    /* Invalid? Skip. */
>>>> +    rldicl. r0, r9, 37, 63
>>>> +    beq    slb_entry_skip_2
>>>> +    xoris    r9, r9, SLB_ESID_V@h
>>>> +    std    r9, 0x20(r12)
>>>> +slb_entry_skip_2:
>>>> +    ld    r9, 0x30(r12)
>>>> +    /* Invalid? Skip. */
>>>> +    rldicl. r0, r9, 37, 63
>>>> +    beq    slb_entry_skip_3
>>>> +    xoris    r9, r9, SLB_ESID_V@h
>>>> +    std    r9, 0x30(r12)
>>>
>>> Can these 3 be made into a macro?
>>
>> Phew - dynamically generating jump points sounds rather hard. I can
>> give it a try...
>>
>>>
>>>> +slb_entry_skip_3:
>>>> +
>>>> +#else
>>>> +#error unknown number of bolted entries
>>>> +#endif
>>>> +
>>>> +    /* Flush SLB */
>>>> +
>>>> +    slbia
>>>> +
>>>> +    /* r0 = esid & ESID_MASK */
>>>> +    rldicr  r10, r10, 0, 35
>>>> +    /* r0 |= CLASS_BIT(VSID) */
>>>> +    rldic   r12, r11, 56 - 36, 36
>>>> +    or      r10, r10, r12
>>>> +    slbie    r10
>>>> +
>>>> +    isync
>>>> +
>>>> +    /* Fill SLB with our shadow */
>>>> +
>>>> +    lbz    r12, PACA_KVM_SLB_MAX(r13)
>>>> +    mulli    r12, r12, 16
>>>> +    addi    r12, r12, PACA_KVM_SLB
>>>> +    add    r12, r12, r13
>>>> +
>>>> +    /* for (r11 = kvm_slb; r11 < kvm_slb + kvm_slb_size;
>>>> r11+=slb_entry) */
>>>> +    li    r11, PACA_KVM_SLB
>>>> +    add    r11, r11, r13
>>>> +
>>>> +slb_loop_enter:
>>>> +
>>>> +    ld    r10, 0(r11)
>>>> +
>>>> +    rldicl. r0, r10, 37, 63
>>>> +    beq    slb_loop_enter_skip
>>>> +
>>>> +    ld    r9, 8(r11)
>>>> +    slbmte    r9, r10
>>>
>>> If you're updating the first 3 slbs, you need to make sure the slb
>>> shadow is updated at the same time
>>
>> Well - what happens if we don't? We'd get a segment fault when phyp
>> stole our entry! So what? Let it fault, see the mapping is already
>> there and get back in again :-).
>
> The problem is you won't take the segment fault as PHYP may put a  
> valid
> entry in there.  PHYP will put back what's in the shadow buffer, which
> could be valid hence no segment fault.

The shadow buffer contains V=0 entries :).

Alex

>
>>> (BTW dumb question: can we run this
>>> under PHYP?)
>>
>> Yes, I tested it on bare metal, phyp and a PS3.
>
> Nice!
>
> Mikey
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 08/27] Add SLB switching code for entry/exit
@ 2009-11-02  9:59                 ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-11-02  9:59 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson


Am 02.11.2009 um 10:39 schrieb Michael Neuling <mikey@neuling.org>:

>>>> This is the really low level of guest entry/exit code.
>>>>
>>>> Book3s_64 has an SLB, which stores all ESID -> VSID mappings we're
>>>> currently aware of.
>>>>
>>>> The segments in the guest differ from the ones on the host, so we
>>>> need
>>>> to switch the SLB to tell the MMU that we're in a new context.
>>>>
>>>> So we store a shadow of the guest's SLB in the PACA, switch to that
>>>> on
>>>> entry and only restore bolted entries on exit, leaving the rest to
>>>> the
>>>> Linux SLB fault handler.
>>>>
>>>> That way we get a really clean way of switching the SLB.
>>>>
>>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>>> ---
>>>> arch/powerpc/kvm/book3s_64_slb.S |  277 ++++++++++++++++++++++++++ 
>>>> ++
>>>> ++++++++
>>> ++
>>>> 1 files changed, 277 insertions(+), 0 deletions(-)
>>>> create mode 100644 arch/powerpc/kvm/book3s_64_slb.S
>>>>
>>>> diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/
>>>> book3s_64_sl
>>> b.S
>>>> new file mode 100644
>>>> index 0000000..00a8367
>>>> --- /dev/null
>>>> +++ b/arch/powerpc/kvm/book3s_64_slb.S
>>>> @@ -0,0 +1,277 @@
>>>> +/*
>>>> + * This program is free software; you can redistribute it and/or
>>>> modify
>>>> + * it under the terms of the GNU General Public License, version
>>>> 2, as
>>>> + * published by the Free Software Foundation.
>>>> + *
>>>> + * This program is distributed in the hope that it will be useful,
>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>> + * GNU General Public License for more details.
>>>> + *
>>>> + * You should have received a copy of the GNU General Public  
>>>> License
>>>> + * along with this program; if not, write to the Free Software
>>>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
>>>> 02110-1301, USA.
>>>> + *
>>>> + * Copyright SUSE Linux Products GmbH 2009
>>>> + *
>>>> + * Authors: Alexander Graf <agraf@suse.de>
>>>> + */
>>>> +
>>>> +/
>>>> ***
>>>> ***
>>>> *** 
>>>> ******************************************************************
>>> ***
>>>> + *
>>> *
>>>> + *                               Entry code
>>> *
>>>> + *
>>> *
>>>> +
>>>> ***
>>>> ***
>>>> *** 
>>>> ******************************************************************
>>> **/
>>>> +
>>>> +.global kvmppc_handler_trampoline_enter
>>>> +kvmppc_handler_trampoline_enter:
>>>> +
>>>> +    /* Required state:
>>>> +     *
>>>> +     * MSR = ~IR|DR
>>>> +     * R13 = PACA
>>>> +     * R9 = guest IP
>>>> +     * R10 = guest MSR
>>>> +     * R11 = free
>>>> +     * R12 = free
>>>> +     * PACA[PACA_EXMC + EX_R9] = guest R9
>>>> +     * PACA[PACA_EXMC + EX_R10] = guest R10
>>>> +     * PACA[PACA_EXMC + EX_R11] = guest R11
>>>> +     * PACA[PACA_EXMC + EX_R12] = guest R12
>>>> +     * PACA[PACA_EXMC + EX_R13] = guest R13
>>>> +     * PACA[PACA_EXMC + EX_CCR] = guest CR
>>>> +     * PACA[PACA_EXMC + EX_R3] = guest XER
>>>> +     */
>>>> +
>>>> +    mtsrr0    r9
>>>> +    mtsrr1    r10
>>>> +
>>>> +    mtspr    SPRN_SPRG_SCRATCH0, r0
>>>> +
>>>> +    /* Remove LPAR shadow entries */
>>>> +
>>>> +#if SLB_NUM_BOLTED == 3
>>>
>>> You could alternatively check the persistent entry in the  
>>> slb_shawdow
>>> buffer.  This would give you a run time check.  Not sure what's best
>>> though.
>>
>> Well we're in the hot path here, so anything using as few registers  
>> as
>> possible and being simple is the best :-). I'd guess the more we are
>> clever at compile time the better.
>
> Yeah, I tend to agree.
>
>>
>>>
>>>
>>>> +
>>>> +    ld    r12, PACA_SLBSHADOWPTR(r13)
>>>> +    ld    r10, 0x10(r12)
>>>> +    ld    r11, 0x18(r12)
>>>
>>> Can you define something in asm-offsets.c for these magic constants
>>> 0x10
>>> and 0x18.  Similarly below.
>>>
>>>> +    /* Invalid? Skip. */
>>>> +    rldicl. r0, r10, 37, 63
>>>> +    beq    slb_entry_skip_1
>>>> +    xoris    r9, r10, SLB_ESID_V@h
>>>> +    std    r9, 0x10(r12)
>>>> +slb_entry_skip_1:
>>>> +    ld    r9, 0x20(r12)
>>>> +    /* Invalid? Skip. */
>>>> +    rldicl. r0, r9, 37, 63
>>>> +    beq    slb_entry_skip_2
>>>> +    xoris    r9, r9, SLB_ESID_V@h
>>>> +    std    r9, 0x20(r12)
>>>> +slb_entry_skip_2:
>>>> +    ld    r9, 0x30(r12)
>>>> +    /* Invalid? Skip. */
>>>> +    rldicl. r0, r9, 37, 63
>>>> +    beq    slb_entry_skip_3
>>>> +    xoris    r9, r9, SLB_ESID_V@h
>>>> +    std    r9, 0x30(r12)
>>>
>>> Can these 3 be made into a macro?
>>
>> Phew - dynamically generating jump points sounds rather hard. I can
>> give it a try...
>>
>>>
>>>> +slb_entry_skip_3:
>>>> +
>>>> +#else
>>>> +#error unknown number of bolted entries
>>>> +#endif
>>>> +
>>>> +    /* Flush SLB */
>>>> +
>>>> +    slbia
>>>> +
>>>> +    /* r0 = esid & ESID_MASK */
>>>> +    rldicr  r10, r10, 0, 35
>>>> +    /* r0 |= CLASS_BIT(VSID) */
>>>> +    rldic   r12, r11, 56 - 36, 36
>>>> +    or      r10, r10, r12
>>>> +    slbie    r10
>>>> +
>>>> +    isync
>>>> +
>>>> +    /* Fill SLB with our shadow */
>>>> +
>>>> +    lbz    r12, PACA_KVM_SLB_MAX(r13)
>>>> +    mulli    r12, r12, 16
>>>> +    addi    r12, r12, PACA_KVM_SLB
>>>> +    add    r12, r12, r13
>>>> +
>>>> +    /* for (r11 = kvm_slb; r11 < kvm_slb + kvm_slb_size;
>>>> r11+=slb_entry) */
>>>> +    li    r11, PACA_KVM_SLB
>>>> +    add    r11, r11, r13
>>>> +
>>>> +slb_loop_enter:
>>>> +
>>>> +    ld    r10, 0(r11)
>>>> +
>>>> +    rldicl. r0, r10, 37, 63
>>>> +    beq    slb_loop_enter_skip
>>>> +
>>>> +    ld    r9, 8(r11)
>>>> +    slbmte    r9, r10
>>>
>>> If you're updating the first 3 slbs, you need to make sure the slb
>>> shadow is updated at the same time
>>
>> Well - what happens if we don't? We'd get a segment fault when phyp
>> stole our entry! So what? Let it fault, see the mapping is already
>> there and get back in again :-).
>
> The problem is you won't take the segment fault as PHYP may put a  
> valid
> entry in there.  PHYP will put back what's in the shadow buffer, which
> could be valid hence no segment fault.

The shadow buffer contains V=0 entries :).

Alex

>
>>> (BTW dumb question: can we run this
>>> under PHYP?)
>>
>> Yes, I tested it on bare metal, phyp and a PS3.
>
> Nice!
>
> Mikey
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 08/27] Add SLB switching code for entry/exit
@ 2009-11-02  9:59                 ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-11-02  9:59 UTC (permalink / raw)
  To: Michael Neuling
  Cc: kvm, Kevin Wolf, Arnd Bergmann, Hollis Blanchard,
	Marcelo Tosatti, kvm-ppc, linuxppc-dev, Avi Kivity, bphilips,
	Olof Johansson


Am 02.11.2009 um 10:39 schrieb Michael Neuling <mikey@neuling.org>:

>>>> This is the really low level of guest entry/exit code.
>>>>
>>>> Book3s_64 has an SLB, which stores all ESID -> VSID mappings we're
>>>> currently aware of.
>>>>
>>>> The segments in the guest differ from the ones on the host, so we
>>>> need
>>>> to switch the SLB to tell the MMU that we're in a new context.
>>>>
>>>> So we store a shadow of the guest's SLB in the PACA, switch to that
>>>> on
>>>> entry and only restore bolted entries on exit, leaving the rest to
>>>> the
>>>> Linux SLB fault handler.
>>>>
>>>> That way we get a really clean way of switching the SLB.
>>>>
>>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>>> ---
>>>> arch/powerpc/kvm/book3s_64_slb.S |  277 ++++++++++++++++++++++++++ 
>>>> ++
>>>> ++++++++
>>> ++
>>>> 1 files changed, 277 insertions(+), 0 deletions(-)
>>>> create mode 100644 arch/powerpc/kvm/book3s_64_slb.S
>>>>
>>>> diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/
>>>> book3s_64_sl
>>> b.S
>>>> new file mode 100644
>>>> index 0000000..00a8367
>>>> --- /dev/null
>>>> +++ b/arch/powerpc/kvm/book3s_64_slb.S
>>>> @@ -0,0 +1,277 @@
>>>> +/*
>>>> + * This program is free software; you can redistribute it and/or
>>>> modify
>>>> + * it under the terms of the GNU General Public License, version
>>>> 2, as
>>>> + * published by the Free Software Foundation.
>>>> + *
>>>> + * This program is distributed in the hope that it will be useful,
>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>> + * GNU General Public License for more details.
>>>> + *
>>>> + * You should have received a copy of the GNU General Public  
>>>> License
>>>> + * along with this program; if not, write to the Free Software
>>>> + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
>>>> 02110-1301, USA.
>>>> + *
>>>> + * Copyright SUSE Linux Products GmbH 2009
>>>> + *
>>>> + * Authors: Alexander Graf <agraf@suse.de>
>>>> + */
>>>> +
>>>> +/
>>>> ***
>>>> ***
>>>> *** 
>>>> ******************************************************************
>>> ***
>>>> + *
>>> *
>>>> + *                               Entry code
>>> *
>>>> + *
>>> *
>>>> +
>>>> ***
>>>> ***
>>>> *** 
>>>> ******************************************************************
>>> **/
>>>> +
>>>> +.global kvmppc_handler_trampoline_enter
>>>> +kvmppc_handler_trampoline_enter:
>>>> +
>>>> +    /* Required state:
>>>> +     *
>>>> +     * MSR = ~IR|DR
>>>> +     * R13 = PACA
>>>> +     * R9 = guest IP
>>>> +     * R10 = guest MSR
>>>> +     * R11 = free
>>>> +     * R12 = free
>>>> +     * PACA[PACA_EXMC + EX_R9] = guest R9
>>>> +     * PACA[PACA_EXMC + EX_R10] = guest R10
>>>> +     * PACA[PACA_EXMC + EX_R11] = guest R11
>>>> +     * PACA[PACA_EXMC + EX_R12] = guest R12
>>>> +     * PACA[PACA_EXMC + EX_R13] = guest R13
>>>> +     * PACA[PACA_EXMC + EX_CCR] = guest CR
>>>> +     * PACA[PACA_EXMC + EX_R3] = guest XER
>>>> +     */
>>>> +
>>>> +    mtsrr0    r9
>>>> +    mtsrr1    r10
>>>> +
>>>> +    mtspr    SPRN_SPRG_SCRATCH0, r0
>>>> +
>>>> +    /* Remove LPAR shadow entries */
>>>> +
>>>> +#if SLB_NUM_BOLTED = 3
>>>
>>> You could alternatively check the persistent entry in the  
>>> slb_shawdow
>>> buffer.  This would give you a run time check.  Not sure what's best
>>> though.
>>
>> Well we're in the hot path here, so anything using as few registers  
>> as
>> possible and being simple is the best :-). I'd guess the more we are
>> clever at compile time the better.
>
> Yeah, I tend to agree.
>
>>
>>>
>>>
>>>> +
>>>> +    ld    r12, PACA_SLBSHADOWPTR(r13)
>>>> +    ld    r10, 0x10(r12)
>>>> +    ld    r11, 0x18(r12)
>>>
>>> Can you define something in asm-offsets.c for these magic constants
>>> 0x10
>>> and 0x18.  Similarly below.
>>>
>>>> +    /* Invalid? Skip. */
>>>> +    rldicl. r0, r10, 37, 63
>>>> +    beq    slb_entry_skip_1
>>>> +    xoris    r9, r10, SLB_ESID_V@h
>>>> +    std    r9, 0x10(r12)
>>>> +slb_entry_skip_1:
>>>> +    ld    r9, 0x20(r12)
>>>> +    /* Invalid? Skip. */
>>>> +    rldicl. r0, r9, 37, 63
>>>> +    beq    slb_entry_skip_2
>>>> +    xoris    r9, r9, SLB_ESID_V@h
>>>> +    std    r9, 0x20(r12)
>>>> +slb_entry_skip_2:
>>>> +    ld    r9, 0x30(r12)
>>>> +    /* Invalid? Skip. */
>>>> +    rldicl. r0, r9, 37, 63
>>>> +    beq    slb_entry_skip_3
>>>> +    xoris    r9, r9, SLB_ESID_V@h
>>>> +    std    r9, 0x30(r12)
>>>
>>> Can these 3 be made into a macro?
>>
>> Phew - dynamically generating jump points sounds rather hard. I can
>> give it a try...
>>
>>>
>>>> +slb_entry_skip_3:
>>>> +
>>>> +#else
>>>> +#error unknown number of bolted entries
>>>> +#endif
>>>> +
>>>> +    /* Flush SLB */
>>>> +
>>>> +    slbia
>>>> +
>>>> +    /* r0 = esid & ESID_MASK */
>>>> +    rldicr  r10, r10, 0, 35
>>>> +    /* r0 |= CLASS_BIT(VSID) */
>>>> +    rldic   r12, r11, 56 - 36, 36
>>>> +    or      r10, r10, r12
>>>> +    slbie    r10
>>>> +
>>>> +    isync
>>>> +
>>>> +    /* Fill SLB with our shadow */
>>>> +
>>>> +    lbz    r12, PACA_KVM_SLB_MAX(r13)
>>>> +    mulli    r12, r12, 16
>>>> +    addi    r12, r12, PACA_KVM_SLB
>>>> +    add    r12, r12, r13
>>>> +
>>>> +    /* for (r11 = kvm_slb; r11 < kvm_slb + kvm_slb_size;
>>>> r11+=slb_entry) */
>>>> +    li    r11, PACA_KVM_SLB
>>>> +    add    r11, r11, r13
>>>> +
>>>> +slb_loop_enter:
>>>> +
>>>> +    ld    r10, 0(r11)
>>>> +
>>>> +    rldicl. r0, r10, 37, 63
>>>> +    beq    slb_loop_enter_skip
>>>> +
>>>> +    ld    r9, 8(r11)
>>>> +    slbmte    r9, r10
>>>
>>> If you're updating the first 3 slbs, you need to make sure the slb
>>> shadow is updated at the same time
>>
>> Well - what happens if we don't? We'd get a segment fault when phyp
>> stole our entry! So what? Let it fault, see the mapping is already
>> there and get back in again :-).
>
> The problem is you won't take the segment fault as PHYP may put a  
> valid
> entry in there.  PHYP will put back what's in the shadow buffer, which
> could be valid hence no segment fault.

The shadow buffer contains V=0 entries :).

Alex

>
>>> (BTW dumb question: can we run this
>>> under PHYP?)
>>
>> Yes, I tested it on bare metal, phyp and a PS3.
>
> Nice!
>
> Mikey
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
  2009-10-30 15:47     ` Alexander Graf
@ 2009-11-03  8:47       ` Segher Boessenkool
  -1 siblings, 0 replies; 244+ messages in thread
From: Segher Boessenkool @ 2009-11-03  8:47 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson

Nice patchset.  Some comments on the emulation part:

> +#define OP_31_XOP_EIOIO		854

You mean EIEIO.

> +	case 19:
> +		switch (get_xop(inst)) {
> +		case OP_19_XOP_RFID:
> +		case OP_19_XOP_RFI:
> +			vcpu->arch.pc = vcpu->arch.srr0;
> +			kvmppc_set_msr(vcpu, vcpu->arch.srr1);
> +			*advance = 0;
> +			break;

I think you should only emulate the insns that exist on whatever the  
guest
pretends to be.  RFID exist only on 64-bit implementations.  Same  
comment
everywhere else.

> +		case OP_31_XOP_EIOIO:
> +			break;

Have you always executed an eieio or sync when you get here, or
do you just not allow direct access to I/O devices?  Other context
synchronising insns are not enough, they do not broadcast on the
bus.

> +		case OP_31_XOP_DCBZ:
> +		{
> +			ulong rb =  vcpu->arch.gpr[get_rb(inst)];
> +			ulong ra = 0;
> +			ulong addr;
> +			u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
> +
> +			if (get_ra(inst))
> +				ra = vcpu->arch.gpr[get_ra(inst)];
> +
> +			addr = (ra + rb) & ~31ULL;
> +			if (!(vcpu->arch.msr & MSR_SF))
> +				addr &= 0xffffffff;
> +
> +			if (kvmppc_st(vcpu, addr, 32, zeros)) {

DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there
are HID bits to make it work on 32 bytes only, and an extra DCBZL insn
that always clears a full cache line (128 bytes).

> +	switch (sprn) {
> +	case SPRN_IBAT0U ... SPRN_IBAT3L:
> +		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT0U) / 2];
> +		break;
> +	case SPRN_IBAT4U ... SPRN_IBAT7L:
> +		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT4U) / 2];
> +		break;
> +	case SPRN_DBAT0U ... SPRN_DBAT3L:
> +		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT0U) / 2];
> +		break;
> +	case SPRN_DBAT4U ... SPRN_DBAT7L:
> +		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT4U) / 2];
> +		break;

Do xBAT4..7 have the same SPR numbers on all CPUs?  They are CPU- 
specific
SPRs, after all.  Some CPUs have only six, some only four, some none,  
btw.

> +	case SPRN_HID0:
> +		to_book3s(vcpu)->hid[0] = vcpu->arch.gpr[rs];
> +		break;
> +	case SPRN_HID1:
> +		to_book3s(vcpu)->hid[1] = vcpu->arch.gpr[rs];
> +		break;
> +	case SPRN_HID2:
> +		to_book3s(vcpu)->hid[2] = vcpu->arch.gpr[rs];
> +		break;
> +	case SPRN_HID4:
> +		to_book3s(vcpu)->hid[4] = vcpu->arch.gpr[rs];
> +		break;
> +	case SPRN_HID5:
> +		to_book3s(vcpu)->hid[5] = vcpu->arch.gpr[rs];

HIDs are different per CPU; and worse, different CPUs have different
registers (SPR #s) for the same register name!

> +		/* guest HID5 set can change is_dcbz32 */
> +		if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
> +		    (mfmsr() & MSR_HV))
> +			vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
> +		break;

Wait, does this mean you allow other HID writes when MSR[HV] isn't
set?  All HIDs (and many other SPRs) cannot be read or written in
supervisor mode.


Segher

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
@ 2009-11-03  8:47       ` Segher Boessenkool
  0 siblings, 0 replies; 244+ messages in thread
From: Segher Boessenkool @ 2009-11-03  8:47 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson

Nice patchset.  Some comments on the emulation part:

> +#define OP_31_XOP_EIOIO		854

You mean EIEIO.

> +	case 19:
> +		switch (get_xop(inst)) {
> +		case OP_19_XOP_RFID:
> +		case OP_19_XOP_RFI:
> +			vcpu->arch.pc = vcpu->arch.srr0;
> +			kvmppc_set_msr(vcpu, vcpu->arch.srr1);
> +			*advance = 0;
> +			break;

I think you should only emulate the insns that exist on whatever the  
guest
pretends to be.  RFID exist only on 64-bit implementations.  Same  
comment
everywhere else.

> +		case OP_31_XOP_EIOIO:
> +			break;

Have you always executed an eieio or sync when you get here, or
do you just not allow direct access to I/O devices?  Other context
synchronising insns are not enough, they do not broadcast on the
bus.

> +		case OP_31_XOP_DCBZ:
> +		{
> +			ulong rb =  vcpu->arch.gpr[get_rb(inst)];
> +			ulong ra = 0;
> +			ulong addr;
> +			u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
> +
> +			if (get_ra(inst))
> +				ra = vcpu->arch.gpr[get_ra(inst)];
> +
> +			addr = (ra + rb) & ~31ULL;
> +			if (!(vcpu->arch.msr & MSR_SF))
> +				addr &= 0xffffffff;
> +
> +			if (kvmppc_st(vcpu, addr, 32, zeros)) {

DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there
are HID bits to make it work on 32 bytes only, and an extra DCBZL insn
that always clears a full cache line (128 bytes).

> +	switch (sprn) {
> +	case SPRN_IBAT0U ... SPRN_IBAT3L:
> +		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT0U) / 2];
> +		break;
> +	case SPRN_IBAT4U ... SPRN_IBAT7L:
> +		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT4U) / 2];
> +		break;
> +	case SPRN_DBAT0U ... SPRN_DBAT3L:
> +		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT0U) / 2];
> +		break;
> +	case SPRN_DBAT4U ... SPRN_DBAT7L:
> +		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT4U) / 2];
> +		break;

Do xBAT4..7 have the same SPR numbers on all CPUs?  They are CPU- 
specific
SPRs, after all.  Some CPUs have only six, some only four, some none,  
btw.

> +	case SPRN_HID0:
> +		to_book3s(vcpu)->hid[0] = vcpu->arch.gpr[rs];
> +		break;
> +	case SPRN_HID1:
> +		to_book3s(vcpu)->hid[1] = vcpu->arch.gpr[rs];
> +		break;
> +	case SPRN_HID2:
> +		to_book3s(vcpu)->hid[2] = vcpu->arch.gpr[rs];
> +		break;
> +	case SPRN_HID4:
> +		to_book3s(vcpu)->hid[4] = vcpu->arch.gpr[rs];
> +		break;
> +	case SPRN_HID5:
> +		to_book3s(vcpu)->hid[5] = vcpu->arch.gpr[rs];

HIDs are different per CPU; and worse, different CPUs have different
registers (SPR #s) for the same register name!

> +		/* guest HID5 set can change is_dcbz32 */
> +		if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
> +		    (mfmsr() & MSR_HV))
> +			vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
> +		break;

Wait, does this mean you allow other HID writes when MSR[HV] isn't
set?  All HIDs (and many other SPRs) cannot be read or written in
supervisor mode.


Segher


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
  2009-11-03  8:47       ` Segher Boessenkool
  (?)
@ 2009-11-03  9:06           ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-11-03  9:06 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Kevin Wolf, Arnd Bergmann,
	Hollis Blanchard, Marcelo Tosatti, kvm-ppc,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A, Avi Kivity,
	bphilips-l3A5Bk7waGM, Olof Johansson


On 03.11.2009, at 09:47, Segher Boessenkool wrote:

> Nice patchset.  Some comments on the emulation part:

Cool, thanks for looking though them!

>> +#define OP_31_XOP_EIOIO		854
>
> You mean EIEIO.

Probably, yeah.

>> +	case 19:
>> +		switch (get_xop(inst)) {
>> +		case OP_19_XOP_RFID:
>> +		case OP_19_XOP_RFI:
>> +			vcpu->arch.pc = vcpu->arch.srr0;
>> +			kvmppc_set_msr(vcpu, vcpu->arch.srr1);
>> +			*advance = 0;
>> +			break;
>
> I think you should only emulate the insns that exist on whatever the  
> guest
> pretends to be.  RFID exist only on 64-bit implementations.  Same  
> comment
> everywhere else.

True.

>
>> +		case OP_31_XOP_EIOIO:
>> +			break;
>
> Have you always executed an eieio or sync when you get here, or
> do you just not allow direct access to I/O devices?  Other context
> synchronising insns are not enough, they do not broadcast on the
> bus.

There is no device passthrough yet :-). It's theoretically possible,  
but nothing for it is implemented so far.

>
>> +		case OP_31_XOP_DCBZ:
>> +		{
>> +			ulong rb =  vcpu->arch.gpr[get_rb(inst)];
>> +			ulong ra = 0;
>> +			ulong addr;
>> +			u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
>> +
>> +			if (get_ra(inst))
>> +				ra = vcpu->arch.gpr[get_ra(inst)];
>> +
>> +			addr = (ra + rb) & ~31ULL;
>> +			if (!(vcpu->arch.msr & MSR_SF))
>> +				addr &= 0xffffffff;
>> +
>> +			if (kvmppc_st(vcpu, addr, 32, zeros)) {
>
> DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there
> are HID bits to make it work on 32 bytes only, and an extra DCBZL insn
> that always clears a full cache line (128 bytes).

Yes. We only come here when we patched the dcbz opcodes to invalid  
instructions because cache line size of target == 32.
On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode.

Admittedly though, this could be a lot more clever.

>> +	switch (sprn) {
>> +	case SPRN_IBAT0U ... SPRN_IBAT3L:
>> +		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT0U) / 2];
>> +		break;
>> +	case SPRN_IBAT4U ... SPRN_IBAT7L:
>> +		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT4U) / 2];
>> +		break;
>> +	case SPRN_DBAT0U ... SPRN_DBAT3L:
>> +		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT0U) / 2];
>> +		break;
>> +	case SPRN_DBAT4U ... SPRN_DBAT7L:
>> +		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT4U) / 2];
>> +		break;
>
> Do xBAT4..7 have the same SPR numbers on all CPUs?  They are CPU- 
> specific
> SPRs, after all.  Some CPUs have only six, some only four, some  
> none, btw.

For now only Linux runs which only uses the first 3(?) IIRC. But yes,  
it's probably worth looking into at one point or the other.

>
>> +	case SPRN_HID0:
>> +		to_book3s(vcpu)->hid[0] = vcpu->arch.gpr[rs];
>> +		break;
>> +	case SPRN_HID1:
>> +		to_book3s(vcpu)->hid[1] = vcpu->arch.gpr[rs];
>> +		break;
>> +	case SPRN_HID2:
>> +		to_book3s(vcpu)->hid[2] = vcpu->arch.gpr[rs];
>> +		break;
>> +	case SPRN_HID4:
>> +		to_book3s(vcpu)->hid[4] = vcpu->arch.gpr[rs];
>> +		break;
>> +	case SPRN_HID5:
>> +		to_book3s(vcpu)->hid[5] = vcpu->arch.gpr[rs];
>
> HIDs are different per CPU; and worse, different CPUs have different
> registers (SPR #s) for the same register name!

Sigh :-(

>> +		/* guest HID5 set can change is_dcbz32 */
>> +		if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
>> +		    (mfmsr() & MSR_HV))
>> +			vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
>> +		break;
>
> Wait, does this mean you allow other HID writes when MSR[HV] isn't
> set?  All HIDs (and many other SPRs) cannot be read or written in
> supervisor mode.

When we're running in MSR_HV=0 mode on a 970 we can use the 32 byte  
dcbz HID flag. So all we need to do is tell our entry/exit code to set  
this bit.

If we're on 970 on a hypervisor or on a non-970 though we can't use  
the HID5 bit, so we need to binary patch the opcodes.

So in order to emulate real 970 behavior, we need to be able to  
emulate that HID5 bit too! That's what this chunk of code does - it  
basically sets us in dcbz32 mode when allowed on 970 guests.

Alex

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
@ 2009-11-03  9:06           ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-11-03  9:06 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson


On 03.11.2009, at 09:47, Segher Boessenkool wrote:

> Nice patchset.  Some comments on the emulation part:

Cool, thanks for looking though them!

>> +#define OP_31_XOP_EIOIO		854
>
> You mean EIEIO.

Probably, yeah.

>> +	case 19:
>> +		switch (get_xop(inst)) {
>> +		case OP_19_XOP_RFID:
>> +		case OP_19_XOP_RFI:
>> +			vcpu->arch.pc = vcpu->arch.srr0;
>> +			kvmppc_set_msr(vcpu, vcpu->arch.srr1);
>> +			*advance = 0;
>> +			break;
>
> I think you should only emulate the insns that exist on whatever the  
> guest
> pretends to be.  RFID exist only on 64-bit implementations.  Same  
> comment
> everywhere else.

True.

>
>> +		case OP_31_XOP_EIOIO:
>> +			break;
>
> Have you always executed an eieio or sync when you get here, or
> do you just not allow direct access to I/O devices?  Other context
> synchronising insns are not enough, they do not broadcast on the
> bus.

There is no device passthrough yet :-). It's theoretically possible,  
but nothing for it is implemented so far.

>
>> +		case OP_31_XOP_DCBZ:
>> +		{
>> +			ulong rb =  vcpu->arch.gpr[get_rb(inst)];
>> +			ulong ra = 0;
>> +			ulong addr;
>> +			u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
>> +
>> +			if (get_ra(inst))
>> +				ra = vcpu->arch.gpr[get_ra(inst)];
>> +
>> +			addr = (ra + rb) & ~31ULL;
>> +			if (!(vcpu->arch.msr & MSR_SF))
>> +				addr &= 0xffffffff;
>> +
>> +			if (kvmppc_st(vcpu, addr, 32, zeros)) {
>
> DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there
> are HID bits to make it work on 32 bytes only, and an extra DCBZL insn
> that always clears a full cache line (128 bytes).

Yes. We only come here when we patched the dcbz opcodes to invalid  
instructions because cache line size of target == 32.
On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode.

Admittedly though, this could be a lot more clever.

>> +	switch (sprn) {
>> +	case SPRN_IBAT0U ... SPRN_IBAT3L:
>> +		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT0U) / 2];
>> +		break;
>> +	case SPRN_IBAT4U ... SPRN_IBAT7L:
>> +		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT4U) / 2];
>> +		break;
>> +	case SPRN_DBAT0U ... SPRN_DBAT3L:
>> +		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT0U) / 2];
>> +		break;
>> +	case SPRN_DBAT4U ... SPRN_DBAT7L:
>> +		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT4U) / 2];
>> +		break;
>
> Do xBAT4..7 have the same SPR numbers on all CPUs?  They are CPU- 
> specific
> SPRs, after all.  Some CPUs have only six, some only four, some  
> none, btw.

For now only Linux runs which only uses the first 3(?) IIRC. But yes,  
it's probably worth looking into at one point or the other.

>
>> +	case SPRN_HID0:
>> +		to_book3s(vcpu)->hid[0] = vcpu->arch.gpr[rs];
>> +		break;
>> +	case SPRN_HID1:
>> +		to_book3s(vcpu)->hid[1] = vcpu->arch.gpr[rs];
>> +		break;
>> +	case SPRN_HID2:
>> +		to_book3s(vcpu)->hid[2] = vcpu->arch.gpr[rs];
>> +		break;
>> +	case SPRN_HID4:
>> +		to_book3s(vcpu)->hid[4] = vcpu->arch.gpr[rs];
>> +		break;
>> +	case SPRN_HID5:
>> +		to_book3s(vcpu)->hid[5] = vcpu->arch.gpr[rs];
>
> HIDs are different per CPU; and worse, different CPUs have different
> registers (SPR #s) for the same register name!

Sigh :-(

>> +		/* guest HID5 set can change is_dcbz32 */
>> +		if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
>> +		    (mfmsr() & MSR_HV))
>> +			vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
>> +		break;
>
> Wait, does this mean you allow other HID writes when MSR[HV] isn't
> set?  All HIDs (and many other SPRs) cannot be read or written in
> supervisor mode.

When we're running in MSR_HV=0 mode on a 970 we can use the 32 byte  
dcbz HID flag. So all we need to do is tell our entry/exit code to set  
this bit.

If we're on 970 on a hypervisor or on a non-970 though we can't use  
the HID5 bit, so we need to binary patch the opcodes.

So in order to emulate real 970 behavior, we need to be able to  
emulate that HID5 bit too! That's what this chunk of code does - it  
basically sets us in dcbz32 mode when allowed on 970 guests.

Alex

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
@ 2009-11-03  9:06           ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-11-03  9:06 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Kevin Wolf, Arnd Bergmann,
	Hollis Blanchard, Marcelo Tosatti, kvm-ppc,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A, Avi Kivity,
	bphilips-l3A5Bk7waGM, Olof Johansson


On 03.11.2009, at 09:47, Segher Boessenkool wrote:

> Nice patchset.  Some comments on the emulation part:

Cool, thanks for looking though them!

>> +#define OP_31_XOP_EIOIO		854
>
> You mean EIEIO.

Probably, yeah.

>> +	case 19:
>> +		switch (get_xop(inst)) {
>> +		case OP_19_XOP_RFID:
>> +		case OP_19_XOP_RFI:
>> +			vcpu->arch.pc = vcpu->arch.srr0;
>> +			kvmppc_set_msr(vcpu, vcpu->arch.srr1);
>> +			*advance = 0;
>> +			break;
>
> I think you should only emulate the insns that exist on whatever the  
> guest
> pretends to be.  RFID exist only on 64-bit implementations.  Same  
> comment
> everywhere else.

True.

>
>> +		case OP_31_XOP_EIOIO:
>> +			break;
>
> Have you always executed an eieio or sync when you get here, or
> do you just not allow direct access to I/O devices?  Other context
> synchronising insns are not enough, they do not broadcast on the
> bus.

There is no device passthrough yet :-). It's theoretically possible,  
but nothing for it is implemented so far.

>
>> +		case OP_31_XOP_DCBZ:
>> +		{
>> +			ulong rb =  vcpu->arch.gpr[get_rb(inst)];
>> +			ulong ra = 0;
>> +			ulong addr;
>> +			u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
>> +
>> +			if (get_ra(inst))
>> +				ra = vcpu->arch.gpr[get_ra(inst)];
>> +
>> +			addr = (ra + rb) & ~31ULL;
>> +			if (!(vcpu->arch.msr & MSR_SF))
>> +				addr &= 0xffffffff;
>> +
>> +			if (kvmppc_st(vcpu, addr, 32, zeros)) {
>
> DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there
> are HID bits to make it work on 32 bytes only, and an extra DCBZL insn
> that always clears a full cache line (128 bytes).

Yes. We only come here when we patched the dcbz opcodes to invalid  
instructions because cache line size of target = 32.
On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode.

Admittedly though, this could be a lot more clever.

>> +	switch (sprn) {
>> +	case SPRN_IBAT0U ... SPRN_IBAT3L:
>> +		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT0U) / 2];
>> +		break;
>> +	case SPRN_IBAT4U ... SPRN_IBAT7L:
>> +		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT4U) / 2];
>> +		break;
>> +	case SPRN_DBAT0U ... SPRN_DBAT3L:
>> +		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT0U) / 2];
>> +		break;
>> +	case SPRN_DBAT4U ... SPRN_DBAT7L:
>> +		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT4U) / 2];
>> +		break;
>
> Do xBAT4..7 have the same SPR numbers on all CPUs?  They are CPU- 
> specific
> SPRs, after all.  Some CPUs have only six, some only four, some  
> none, btw.

For now only Linux runs which only uses the first 3(?) IIRC. But yes,  
it's probably worth looking into at one point or the other.

>
>> +	case SPRN_HID0:
>> +		to_book3s(vcpu)->hid[0] = vcpu->arch.gpr[rs];
>> +		break;
>> +	case SPRN_HID1:
>> +		to_book3s(vcpu)->hid[1] = vcpu->arch.gpr[rs];
>> +		break;
>> +	case SPRN_HID2:
>> +		to_book3s(vcpu)->hid[2] = vcpu->arch.gpr[rs];
>> +		break;
>> +	case SPRN_HID4:
>> +		to_book3s(vcpu)->hid[4] = vcpu->arch.gpr[rs];
>> +		break;
>> +	case SPRN_HID5:
>> +		to_book3s(vcpu)->hid[5] = vcpu->arch.gpr[rs];
>
> HIDs are different per CPU; and worse, different CPUs have different
> registers (SPR #s) for the same register name!

Sigh :-(

>> +		/* guest HID5 set can change is_dcbz32 */
>> +		if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
>> +		    (mfmsr() & MSR_HV))
>> +			vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
>> +		break;
>
> Wait, does this mean you allow other HID writes when MSR[HV] isn't
> set?  All HIDs (and many other SPRs) cannot be read or written in
> supervisor mode.

When we're running in MSR_HV=0 mode on a 970 we can use the 32 byte  
dcbz HID flag. So all we need to do is tell our entry/exit code to set  
this bit.

If we're on 970 on a hypervisor or on a non-970 though we can't use  
the HID5 bit, so we need to binary patch the opcodes.

So in order to emulate real 970 behavior, we need to be able to  
emulate that HID5 bit too! That's what this chunk of code does - it  
basically sets us in dcbz32 mode when allowed on 970 guests.

Alex


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
  2009-11-03  9:06           ` Alexander Graf
@ 2009-11-03 21:38             ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-11-03 21:38 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson

On Tue, 2009-11-03 at 10:06 +0100, Alexander Graf wrote:

> > DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there
> > are HID bits to make it work on 32 bytes only, and an extra DCBZL insn
> > that always clears a full cache line (128 bytes).
> 
> Yes. We only come here when we patched the dcbz opcodes to invalid  
> instructions because cache line size of target == 32.
> On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode.
> 
> Admittedly though, this could be a lot more clever.

Yeah well, we also really need to fix ppc32 Linux to use the device-tree
provided cache line size :-) For 64-bits, that should already be the
case, and thus the emulation trick shouldn't be useful as long as you
properly provide the guest with the right size in the device-tree.

(Though glibc can be nasty, afaik it might load up optimized variants of
some routines with hard wired cache line sizes based on the CPU type)

> >> +	switch (sprn) {
> >> +	case SPRN_IBAT0U ... SPRN_IBAT3L:
> >> +		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT0U) / 2];
> >> +		break;
> >> +	case SPRN_IBAT4U ... SPRN_IBAT7L:
> >> +		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT4U) / 2];
> >> +		break;
> >> +	case SPRN_DBAT0U ... SPRN_DBAT3L:
> >> +		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT0U) / 2];
> >> +		break;
> >> +	case SPRN_DBAT4U ... SPRN_DBAT7L:
> >> +		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT4U) / 2];
> >> +		break;
> >
> > Do xBAT4..7 have the same SPR numbers on all CPUs?  They are CPU- 
> > specific
> > SPRs, after all.  Some CPUs have only six, some only four, some  
> > none, btw.
> 
> For now only Linux runs which only uses the first 3(?) IIRC. But yes,  
> it's probably worth looking into at one point or the other.
> 
> >
> >> +	case SPRN_HID0:
> >> +		to_book3s(vcpu)->hid[0] = vcpu->arch.gpr[rs];
> >> +		break;
> >> +	case SPRN_HID1:
> >> +		to_book3s(vcpu)->hid[1] = vcpu->arch.gpr[rs];
> >> +		break;
> >> +	case SPRN_HID2:
> >> +		to_book3s(vcpu)->hid[2] = vcpu->arch.gpr[rs];
> >> +		break;
> >> +	case SPRN_HID4:
> >> +		to_book3s(vcpu)->hid[4] = vcpu->arch.gpr[rs];
> >> +		break;
> >> +	case SPRN_HID5:
> >> +		to_book3s(vcpu)->hid[5] = vcpu->arch.gpr[rs];
> >
> > HIDs are different per CPU; and worse, different CPUs have different
> > registers (SPR #s) for the same register name!
> 
> Sigh :-(

On the other hand, you can probably just "Swallow" all of these and
Linux won't even notice, except for the case of the sleep state maybe on
6xx/7xx/7xxx. Just a matter of knowing what your are emulating as guest.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
@ 2009-11-03 21:38             ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-11-03 21:38 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson

On Tue, 2009-11-03 at 10:06 +0100, Alexander Graf wrote:

> > DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there
> > are HID bits to make it work on 32 bytes only, and an extra DCBZL insn
> > that always clears a full cache line (128 bytes).
> 
> Yes. We only come here when we patched the dcbz opcodes to invalid  
> instructions because cache line size of target = 32.
> On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode.
> 
> Admittedly though, this could be a lot more clever.

Yeah well, we also really need to fix ppc32 Linux to use the device-tree
provided cache line size :-) For 64-bits, that should already be the
case, and thus the emulation trick shouldn't be useful as long as you
properly provide the guest with the right size in the device-tree.

(Though glibc can be nasty, afaik it might load up optimized variants of
some routines with hard wired cache line sizes based on the CPU type)

> >> +	switch (sprn) {
> >> +	case SPRN_IBAT0U ... SPRN_IBAT3L:
> >> +		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT0U) / 2];
> >> +		break;
> >> +	case SPRN_IBAT4U ... SPRN_IBAT7L:
> >> +		bat = &vcpu_book3s->ibat[(sprn - SPRN_IBAT4U) / 2];
> >> +		break;
> >> +	case SPRN_DBAT0U ... SPRN_DBAT3L:
> >> +		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT0U) / 2];
> >> +		break;
> >> +	case SPRN_DBAT4U ... SPRN_DBAT7L:
> >> +		bat = &vcpu_book3s->dbat[(sprn - SPRN_DBAT4U) / 2];
> >> +		break;
> >
> > Do xBAT4..7 have the same SPR numbers on all CPUs?  They are CPU- 
> > specific
> > SPRs, after all.  Some CPUs have only six, some only four, some  
> > none, btw.
> 
> For now only Linux runs which only uses the first 3(?) IIRC. But yes,  
> it's probably worth looking into at one point or the other.
> 
> >
> >> +	case SPRN_HID0:
> >> +		to_book3s(vcpu)->hid[0] = vcpu->arch.gpr[rs];
> >> +		break;
> >> +	case SPRN_HID1:
> >> +		to_book3s(vcpu)->hid[1] = vcpu->arch.gpr[rs];
> >> +		break;
> >> +	case SPRN_HID2:
> >> +		to_book3s(vcpu)->hid[2] = vcpu->arch.gpr[rs];
> >> +		break;
> >> +	case SPRN_HID4:
> >> +		to_book3s(vcpu)->hid[4] = vcpu->arch.gpr[rs];
> >> +		break;
> >> +	case SPRN_HID5:
> >> +		to_book3s(vcpu)->hid[5] = vcpu->arch.gpr[rs];
> >
> > HIDs are different per CPU; and worse, different CPUs have different
> > registers (SPR #s) for the same register name!
> 
> Sigh :-(

On the other hand, you can probably just "Swallow" all of these and
Linux won't even notice, except for the case of the sleep state maybe on
6xx/7xx/7xxx. Just a matter of knowing what your are emulating as guest.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
  2009-11-03 21:38             ` Benjamin Herrenschmidt
@ 2009-11-04  8:43               ` Arnd Bergmann
  -1 siblings, 0 replies; 244+ messages in thread
From: Arnd Bergmann @ 2009-11-04  8:43 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Kevin Wolf, Hollis Blanchard, Marcelo Tosatti, Alexander Graf,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson

On Tuesday 03 November 2009, Benjamin Herrenschmidt wrote:
> (Though glibc can be nasty, afaik it might load up optimized variants of
> some routines with hard wired cache line sizes based on the CPU type)

You can also get application with hand-coded cache optimizations
that are even harder, if not impossible, to fix.

	Arnd <><

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
@ 2009-11-04  8:43               ` Arnd Bergmann
  0 siblings, 0 replies; 244+ messages in thread
From: Arnd Bergmann @ 2009-11-04  8:43 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Kevin Wolf, Hollis Blanchard, Marcelo Tosatti, Alexander Graf,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson

On Tuesday 03 November 2009, Benjamin Herrenschmidt wrote:
> (Though glibc can be nasty, afaik it might load up optimized variants of
> some routines with hard wired cache line sizes based on the CPU type)

You can also get application with hand-coded cache optimizations
that are even harder, if not impossible, to fix.

	Arnd <><

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
  2009-11-04  8:43               ` Arnd Bergmann
@ 2009-11-04  8:47                 ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-11-04  8:47 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Kevin Wolf, Hollis Blanchard, Marcelo Tosatti, Alexander Graf,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson

On Wed, 2009-11-04 at 09:43 +0100, Arnd Bergmann wrote:
> On Tuesday 03 November 2009, Benjamin Herrenschmidt wrote:
> > (Though glibc can be nasty, afaik it might load up optimized
> variants of
> > some routines with hard wired cache line sizes based on the CPU
> type)
> 
> You can also get application with hand-coded cache optimizations
> that are even harder, if not impossible, to fix. 

Right. But those are already broken across CPU variants anyways.

Cheers,
Ben

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
@ 2009-11-04  8:47                 ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-11-04  8:47 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Kevin Wolf, Hollis Blanchard, Marcelo Tosatti, Alexander Graf,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson

On Wed, 2009-11-04 at 09:43 +0100, Arnd Bergmann wrote:
> On Tuesday 03 November 2009, Benjamin Herrenschmidt wrote:
> > (Though glibc can be nasty, afaik it might load up optimized
> variants of
> > some routines with hard wired cache line sizes based on the CPU
> type)
> 
> You can also get application with hand-coded cache optimizations
> that are even harder, if not impossible, to fix. 

Right. But those are already broken across CPU variants anyways.

Cheers,
Ben


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
  2009-11-04  8:47                 ` Benjamin Herrenschmidt
  (?)
@ 2009-11-04 11:35                   ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-11-04 11:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Arnd Bergmann, Segher Boessenkool, kvm-u79uwXL29TY76Z2rM5mHXA,
	Kevin Wolf, Hollis Blanchard, Marcelo Tosatti, kvm-ppc,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A, Avi Kivity,
	bphilips-l3A5Bk7waGM, Olof Johansson


On 04.11.2009, at 09:47, Benjamin Herrenschmidt wrote:

> On Wed, 2009-11-04 at 09:43 +0100, Arnd Bergmann wrote:
>> On Tuesday 03 November 2009, Benjamin Herrenschmidt wrote:
>>> (Though glibc can be nasty, afaik it might load up optimized
>> variants of
>>> some routines with hard wired cache line sizes based on the CPU
>> type)
>>
>> You can also get application with hand-coded cache optimizations
>> that are even harder, if not impossible, to fix.
>
> Right. But those are already broken across CPU variants anyways.

... which might be the reason you're using KVM in the first place.

Alex

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
@ 2009-11-04 11:35                   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-11-04 11:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Kevin Wolf, Hollis Blanchard, Marcelo Tosatti, kvm-ppc,
	linuxppc-dev, Arnd Bergmann, Avi Kivity, kvm, bphilips,
	Olof Johansson


On 04.11.2009, at 09:47, Benjamin Herrenschmidt wrote:

> On Wed, 2009-11-04 at 09:43 +0100, Arnd Bergmann wrote:
>> On Tuesday 03 November 2009, Benjamin Herrenschmidt wrote:
>>> (Though glibc can be nasty, afaik it might load up optimized
>> variants of
>>> some routines with hard wired cache line sizes based on the CPU
>> type)
>>
>> You can also get application with hand-coded cache optimizations
>> that are even harder, if not impossible, to fix.
>
> Right. But those are already broken across CPU variants anyways.

... which might be the reason you're using KVM in the first place.

Alex

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
@ 2009-11-04 11:35                   ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-11-04 11:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Arnd Bergmann, Segher Boessenkool, kvm-u79uwXL29TY76Z2rM5mHXA,
	Kevin Wolf, Hollis Blanchard, Marcelo Tosatti, kvm-ppc,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A, Avi Kivity,
	bphilips-l3A5Bk7waGM, Olof Johansson


On 04.11.2009, at 09:47, Benjamin Herrenschmidt wrote:

> On Wed, 2009-11-04 at 09:43 +0100, Arnd Bergmann wrote:
>> On Tuesday 03 November 2009, Benjamin Herrenschmidt wrote:
>>> (Though glibc can be nasty, afaik it might load up optimized
>> variants of
>>> some routines with hard wired cache line sizes based on the CPU
>> type)
>>
>> You can also get application with hand-coded cache optimizations
>> that are even harder, if not impossible, to fix.
>
> Right. But those are already broken across CPU variants anyways.

... which might be the reason you're using KVM in the first place.

Alex


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
  2009-11-03  9:06           ` Alexander Graf
@ 2009-11-05  0:53             ` Segher Boessenkool
  -1 siblings, 0 replies; 244+ messages in thread
From: Segher Boessenkool @ 2009-11-05  0:53 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson

>>> +		case OP_31_XOP_EIOIO:
>>> +			break;
>>
>> Have you always executed an eieio or sync when you get here, or
>> do you just not allow direct access to I/O devices?  Other context
>> synchronising insns are not enough, they do not broadcast on the
>> bus.
>
> There is no device passthrough yet :-). It's theoretically  
> possible, but nothing for it is implemented so far.

You could just always do an eieio here, it's not expensive at all
compared to the emulation trap itself.

However -- eieio is a Book II insn, it will never trap anyway!

>>> +		case OP_31_XOP_DCBZ:
>>> +		{
>>> +			ulong rb =  vcpu->arch.gpr[get_rb(inst)];
>>> +			ulong ra = 0;
>>> +			ulong addr;
>>> +			u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
>>> +
>>> +			if (get_ra(inst))
>>> +				ra = vcpu->arch.gpr[get_ra(inst)];
>>> +
>>> +			addr = (ra + rb) & ~31ULL;
>>> +			if (!(vcpu->arch.msr & MSR_SF))
>>> +				addr &= 0xffffffff;
>>> +
>>> +			if (kvmppc_st(vcpu, addr, 32, zeros)) {
>>
>> DCBZ zeroes out a cache line, not 32 bytes; except on 970, where  
>> there
>> are HID bits to make it work on 32 bytes only, and an extra DCBZL  
>> insn
>> that always clears a full cache line (128 bytes).
>
> Yes. We only come here when we patched the dcbz opcodes to invalid  
> instructions

Ah yes, I forgot.  Could you rename it to OP_31_XOP_FAKE_32BIT_DCBZ
or such?

> because cache line size of target == 32.
> On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode.
>
> Admittedly though, this could be a lot more clever.

>>> +		/* guest HID5 set can change is_dcbz32 */
>>> +		if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
>>> +		    (mfmsr() & MSR_HV))
>>> +			vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
>>> +		break;
>>
>> Wait, does this mean you allow other HID writes when MSR[HV] isn't
>> set?  All HIDs (and many other SPRs) cannot be read or written in
>> supervisor mode.
>
> When we're running in MSR_HV=0 mode on a 970 we can use the 32 byte  
> dcbz HID flag. So all we need to do is tell our entry/exit code to  
> set this bit.

Which patch contains that entry/exit code?

> If we're on 970 on a hypervisor or on a non-970 though we can't use  
> the HID5 bit, so we need to binary patch the opcodes.
>
> So in order to emulate real 970 behavior, we need to be able to  
> emulate that HID5 bit too! That's what this chunk of code does - it  
> basically sets us in dcbz32 mode when allowed on 970 guests.

But when MSR[HV]=0 and MSR[PR]=0, mtspr to a hypervisor resource
will not trap but be silently ignored.  Sorry for not being more clear.
...Oh.  You run your guest as MSR[PR]=1 anyway!  Tricky.


Segher

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
@ 2009-11-05  0:53             ` Segher Boessenkool
  0 siblings, 0 replies; 244+ messages in thread
From: Segher Boessenkool @ 2009-11-05  0:53 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson

>>> +		case OP_31_XOP_EIOIO:
>>> +			break;
>>
>> Have you always executed an eieio or sync when you get here, or
>> do you just not allow direct access to I/O devices?  Other context
>> synchronising insns are not enough, they do not broadcast on the
>> bus.
>
> There is no device passthrough yet :-). It's theoretically  
> possible, but nothing for it is implemented so far.

You could just always do an eieio here, it's not expensive at all
compared to the emulation trap itself.

However -- eieio is a Book II insn, it will never trap anyway!

>>> +		case OP_31_XOP_DCBZ:
>>> +		{
>>> +			ulong rb =  vcpu->arch.gpr[get_rb(inst)];
>>> +			ulong ra = 0;
>>> +			ulong addr;
>>> +			u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
>>> +
>>> +			if (get_ra(inst))
>>> +				ra = vcpu->arch.gpr[get_ra(inst)];
>>> +
>>> +			addr = (ra + rb) & ~31ULL;
>>> +			if (!(vcpu->arch.msr & MSR_SF))
>>> +				addr &= 0xffffffff;
>>> +
>>> +			if (kvmppc_st(vcpu, addr, 32, zeros)) {
>>
>> DCBZ zeroes out a cache line, not 32 bytes; except on 970, where  
>> there
>> are HID bits to make it work on 32 bytes only, and an extra DCBZL  
>> insn
>> that always clears a full cache line (128 bytes).
>
> Yes. We only come here when we patched the dcbz opcodes to invalid  
> instructions

Ah yes, I forgot.  Could you rename it to OP_31_XOP_FAKE_32BIT_DCBZ
or such?

> because cache line size of target = 32.
> On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode.
>
> Admittedly though, this could be a lot more clever.

>>> +		/* guest HID5 set can change is_dcbz32 */
>>> +		if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
>>> +		    (mfmsr() & MSR_HV))
>>> +			vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
>>> +		break;
>>
>> Wait, does this mean you allow other HID writes when MSR[HV] isn't
>> set?  All HIDs (and many other SPRs) cannot be read or written in
>> supervisor mode.
>
> When we're running in MSR_HV=0 mode on a 970 we can use the 32 byte  
> dcbz HID flag. So all we need to do is tell our entry/exit code to  
> set this bit.

Which patch contains that entry/exit code?

> If we're on 970 on a hypervisor or on a non-970 though we can't use  
> the HID5 bit, so we need to binary patch the opcodes.
>
> So in order to emulate real 970 behavior, we need to be able to  
> emulate that HID5 bit too! That's what this chunk of code does - it  
> basically sets us in dcbz32 mode when allowed on 970 guests.

But when MSR[HV]=0 and MSR[PR]=0, mtspr to a hypervisor resource
will not trap but be silently ignored.  Sorry for not being more clear.
...Oh.  You run your guest as MSR[PR]=1 anyway!  Tricky.


Segher


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v6
  2009-10-30 15:47 ` Alexander Graf
  (?)
@ 2009-11-05  6:03     ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-11-05  6:03 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-ppc,
	Hollis Blanchard, Arnd Bergmann, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

On Fri, 2009-10-30 at 16:47 +0100, Alexander Graf wrote:
> KVM for PowerPC only supports embedded cores at the moment.
> 
> While it makes sense to virtualize on small machines, it's even more fun
> to do so on big boxes. So I figured we need KVM for PowerPC64 as well.

I get that with exit timing enabled:

arch/powerpc/kvm/timing.c:205: error: ‘THIS_MODULE’ undeclared here (not in a function)

I'll stick a fixup patch in my tree (just adding #include <linux/module.h>)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v6
@ 2009-11-05  6:03     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-11-05  6:03 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson

On Fri, 2009-10-30 at 16:47 +0100, Alexander Graf wrote:
> KVM for PowerPC only supports embedded cores at the moment.
> 
> While it makes sense to virtualize on small machines, it's even more fun
> to do so on big boxes. So I figured we need KVM for PowerPC64 as well.

I get that with exit timing enabled:

arch/powerpc/kvm/timing.c:205: error: ‘THIS_MODULE’ undeclared here (not in a function)

I'll stick a fixup patch in my tree (just adding #include <linux/module.h>)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v6
@ 2009-11-05  6:03     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 244+ messages in thread
From: Benjamin Herrenschmidt @ 2009-11-05  6:03 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm-u79uwXL29TY76Z2rM5mHXA, Avi Kivity, kvm-ppc,
	Hollis Blanchard, Arnd Bergmann, Kevin Wolf,
	bphilips-l3A5Bk7waGM, Marcelo Tosatti, Olof Johansson,
	linuxppc-dev-mnsaURCQ41sdnm+yROfE0A

On Fri, 2009-10-30 at 16:47 +0100, Alexander Graf wrote:
> KVM for PowerPC only supports embedded cores at the moment.
> 
> While it makes sense to virtualize on small machines, it's even more fun
> to do so on big boxes. So I figured we need KVM for PowerPC64 as well.

I get that with exit timing enabled:

arch/powerpc/kvm/timing.c:205: error: ‘THIS_MODULE’ undeclared here (not in a function)

I'll stick a fixup patch in my tree (just adding #include <linux/module.h>)

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
  2009-11-05  0:53             ` Segher Boessenkool
  (?)
@ 2009-11-05 10:09               ` Alexander Graf
  -1 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-11-05 10:09 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: kvm, Kevin Wolf, Arnd Bergmann, Hollis Blanchard,
	Marcelo Tosatti, kvm-ppc, linuxppc-dev, Avi Kivity, bphilips,
	Olof Johansson


On 05.11.2009, at 01:53, Segher Boessenkool wrote:

>>>> +		case OP_31_XOP_EIOIO:
>>>> +			break;
>>>
>>> Have you always executed an eieio or sync when you get here, or
>>> do you just not allow direct access to I/O devices?  Other context
>>> synchronising insns are not enough, they do not broadcast on the
>>> bus.
>>
>> There is no device passthrough yet :-). It's theoretically  
>> possible, but nothing for it is implemented so far.
>
> You could just always do an eieio here, it's not expensive at all
> compared to the emulation trap itself.
>
> However -- eieio is a Book II insn, it will never trap anyway!

Don't all 31 ops trap? I'm pretty sure I added the emulation because I  
saw the trap.

>>>> +		case OP_31_XOP_DCBZ:
>>>> +		{
>>>> +			ulong rb =  vcpu->arch.gpr[get_rb(inst)];
>>>> +			ulong ra = 0;
>>>> +			ulong addr;
>>>> +			u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
>>>> +
>>>> +			if (get_ra(inst))
>>>> +				ra = vcpu->arch.gpr[get_ra(inst)];
>>>> +
>>>> +			addr = (ra + rb) & ~31ULL;
>>>> +			if (!(vcpu->arch.msr & MSR_SF))
>>>> +				addr &= 0xffffffff;
>>>> +
>>>> +			if (kvmppc_st(vcpu, addr, 32, zeros)) {
>>>
>>> DCBZ zeroes out a cache line, not 32 bytes; except on 970, where  
>>> there
>>> are HID bits to make it work on 32 bytes only, and an extra DCBZL  
>>> insn
>>> that always clears a full cache line (128 bytes).
>>
>> Yes. We only come here when we patched the dcbz opcodes to invalid  
>> instructions
>
> Ah yes, I forgot.  Could you rename it to OP_31_XOP_FAKE_32BIT_DCBZ
> or such?

Good idea.

>> because cache line size of target == 32.
>> On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode.
>>
>> Admittedly though, this could be a lot more clever.
>
>>>> +		/* guest HID5 set can change is_dcbz32 */
>>>> +		if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
>>>> +		    (mfmsr() & MSR_HV))
>>>> +			vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
>>>> +		break;
>>>
>>> Wait, does this mean you allow other HID writes when MSR[HV] isn't
>>> set?  All HIDs (and many other SPRs) cannot be read or written in
>>> supervisor mode.
>>
>> When we're running in MSR_HV=0 mode on a 970 we can use the 32 byte  
>> dcbz HID flag. So all we need to do is tell our entry/exit code to  
>> set this bit.
>
> Which patch contains that entry/exit code?

That's patch 7 / 27.

+	/* Some guests may need to have dcbz set to 32 byte length.
+	 *
+	 * Usually we ensure that by patching the guest's instructions
+	 * to trap on dcbz and emulate it in the hypervisor.
+	 *
+	 * If we can, we should tell the CPU to use 32 byte dcbz though,
+	 * because that's a lot faster.
+	 */
+
+	ld	r3, VCPU_HFLAGS(r4)
+	rldicl.	r3, r3, 0, 63		/* CR = ((r3 & 1) == 0) */
+	beq	no_dcbz32_on
+
+	mfspr   r3,SPRN_HID5
+	ori     r3, r3, 0x80		/* XXX HID5_dcbz32 = 0x80 */
+	mtspr   SPRN_HID5,r3
+
+no_dcbz32_on:

>> If we're on 970 on a hypervisor or on a non-970 though we can't use  
>> the HID5 bit, so we need to binary patch the opcodes.
>>
>> So in order to emulate real 970 behavior, we need to be able to  
>> emulate that HID5 bit too! That's what this chunk of code does - it  
>> basically sets us in dcbz32 mode when allowed on 970 guests.
>
> But when MSR[HV]=0 and MSR[PR]=0, mtspr to a hypervisor resource
> will not trap but be silently ignored.  Sorry for not being more  
> clear.
> ...Oh.  You run your guest as MSR[PR]=1 anyway!  Tricky.

Yeah, the guest is always running in PR=1, so all HV checks are for  
the host. Usually we run in HV=1 on the host, because IBM doesn't sell  
machines that have HV=0 accessible for mortals :-).


I'll address your comments in a follow-up patch once the stuff is  
merged.

Alex


^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
@ 2009-11-05 10:09               ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-11-05 10:09 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Kevin Wolf, Arnd Bergmann, Hollis Blanchard, Marcelo Tosatti,
	kvm-ppc, linuxppc-dev, Avi Kivity, kvm, bphilips, Olof Johansson


On 05.11.2009, at 01:53, Segher Boessenkool wrote:

>>>> +		case OP_31_XOP_EIOIO:
>>>> +			break;
>>>
>>> Have you always executed an eieio or sync when you get here, or
>>> do you just not allow direct access to I/O devices?  Other context
>>> synchronising insns are not enough, they do not broadcast on the
>>> bus.
>>
>> There is no device passthrough yet :-). It's theoretically  
>> possible, but nothing for it is implemented so far.
>
> You could just always do an eieio here, it's not expensive at all
> compared to the emulation trap itself.
>
> However -- eieio is a Book II insn, it will never trap anyway!

Don't all 31 ops trap? I'm pretty sure I added the emulation because I  
saw the trap.

>>>> +		case OP_31_XOP_DCBZ:
>>>> +		{
>>>> +			ulong rb =  vcpu->arch.gpr[get_rb(inst)];
>>>> +			ulong ra = 0;
>>>> +			ulong addr;
>>>> +			u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
>>>> +
>>>> +			if (get_ra(inst))
>>>> +				ra = vcpu->arch.gpr[get_ra(inst)];
>>>> +
>>>> +			addr = (ra + rb) & ~31ULL;
>>>> +			if (!(vcpu->arch.msr & MSR_SF))
>>>> +				addr &= 0xffffffff;
>>>> +
>>>> +			if (kvmppc_st(vcpu, addr, 32, zeros)) {
>>>
>>> DCBZ zeroes out a cache line, not 32 bytes; except on 970, where  
>>> there
>>> are HID bits to make it work on 32 bytes only, and an extra DCBZL  
>>> insn
>>> that always clears a full cache line (128 bytes).
>>
>> Yes. We only come here when we patched the dcbz opcodes to invalid  
>> instructions
>
> Ah yes, I forgot.  Could you rename it to OP_31_XOP_FAKE_32BIT_DCBZ
> or such?

Good idea.

>> because cache line size of target == 32.
>> On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode.
>>
>> Admittedly though, this could be a lot more clever.
>
>>>> +		/* guest HID5 set can change is_dcbz32 */
>>>> +		if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
>>>> +		    (mfmsr() & MSR_HV))
>>>> +			vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
>>>> +		break;
>>>
>>> Wait, does this mean you allow other HID writes when MSR[HV] isn't
>>> set?  All HIDs (and many other SPRs) cannot be read or written in
>>> supervisor mode.
>>
>> When we're running in MSR_HV=0 mode on a 970 we can use the 32 byte  
>> dcbz HID flag. So all we need to do is tell our entry/exit code to  
>> set this bit.
>
> Which patch contains that entry/exit code?

That's patch 7 / 27.

+	/* Some guests may need to have dcbz set to 32 byte length.
+	 *
+	 * Usually we ensure that by patching the guest's instructions
+	 * to trap on dcbz and emulate it in the hypervisor.
+	 *
+	 * If we can, we should tell the CPU to use 32 byte dcbz though,
+	 * because that's a lot faster.
+	 */
+
+	ld	r3, VCPU_HFLAGS(r4)
+	rldicl.	r3, r3, 0, 63		/* CR = ((r3 & 1) == 0) */
+	beq	no_dcbz32_on
+
+	mfspr   r3,SPRN_HID5
+	ori     r3, r3, 0x80		/* XXX HID5_dcbz32 = 0x80 */
+	mtspr   SPRN_HID5,r3
+
+no_dcbz32_on:

>> If we're on 970 on a hypervisor or on a non-970 though we can't use  
>> the HID5 bit, so we need to binary patch the opcodes.
>>
>> So in order to emulate real 970 behavior, we need to be able to  
>> emulate that HID5 bit too! That's what this chunk of code does - it  
>> basically sets us in dcbz32 mode when allowed on 970 guests.
>
> But when MSR[HV]=0 and MSR[PR]=0, mtspr to a hypervisor resource
> will not trap but be silently ignored.  Sorry for not being more  
> clear.
> ...Oh.  You run your guest as MSR[PR]=1 anyway!  Tricky.

Yeah, the guest is always running in PR=1, so all HV checks are for  
the host. Usually we run in HV=1 on the host, because IBM doesn't sell  
machines that have HV=0 accessible for mortals :-).


I'll address your comments in a follow-up patch once the stuff is  
merged.

Alex

^ permalink raw reply	[flat|nested] 244+ messages in thread

* Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
@ 2009-11-05 10:09               ` Alexander Graf
  0 siblings, 0 replies; 244+ messages in thread
From: Alexander Graf @ 2009-11-05 10:09 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: kvm, Kevin Wolf, Arnd Bergmann, Hollis Blanchard,
	Marcelo Tosatti, kvm-ppc, linuxppc-dev, Avi Kivity, bphilips,
	Olof Johansson


On 05.11.2009, at 01:53, Segher Boessenkool wrote:

>>>> +		case OP_31_XOP_EIOIO:
>>>> +			break;
>>>
>>> Have you always executed an eieio or sync when you get here, or
>>> do you just not allow direct access to I/O devices?  Other context
>>> synchronising insns are not enough, they do not broadcast on the
>>> bus.
>>
>> There is no device passthrough yet :-). It's theoretically  
>> possible, but nothing for it is implemented so far.
>
> You could just always do an eieio here, it's not expensive at all
> compared to the emulation trap itself.
>
> However -- eieio is a Book II insn, it will never trap anyway!

Don't all 31 ops trap? I'm pretty sure I added the emulation because I  
saw the trap.

>>>> +		case OP_31_XOP_DCBZ:
>>>> +		{
>>>> +			ulong rb =  vcpu->arch.gpr[get_rb(inst)];
>>>> +			ulong ra = 0;
>>>> +			ulong addr;
>>>> +			u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
>>>> +
>>>> +			if (get_ra(inst))
>>>> +				ra = vcpu->arch.gpr[get_ra(inst)];
>>>> +
>>>> +			addr = (ra + rb) & ~31ULL;
>>>> +			if (!(vcpu->arch.msr & MSR_SF))
>>>> +				addr &= 0xffffffff;
>>>> +
>>>> +			if (kvmppc_st(vcpu, addr, 32, zeros)) {
>>>
>>> DCBZ zeroes out a cache line, not 32 bytes; except on 970, where  
>>> there
>>> are HID bits to make it work on 32 bytes only, and an extra DCBZL  
>>> insn
>>> that always clears a full cache line (128 bytes).
>>
>> Yes. We only come here when we patched the dcbz opcodes to invalid  
>> instructions
>
> Ah yes, I forgot.  Could you rename it to OP_31_XOP_FAKE_32BIT_DCBZ
> or such?

Good idea.

>> because cache line size of target = 32.
>> On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode.
>>
>> Admittedly though, this could be a lot more clever.
>
>>>> +		/* guest HID5 set can change is_dcbz32 */
>>>> +		if (vcpu->arch.mmu.is_dcbz32(vcpu) &&
>>>> +		    (mfmsr() & MSR_HV))
>>>> +			vcpu->arch.hflags |= BOOK3S_HFLAG_DCBZ32;
>>>> +		break;
>>>
>>> Wait, does this mean you allow other HID writes when MSR[HV] isn't
>>> set?  All HIDs (and many other SPRs) cannot be read or written in
>>> supervisor mode.
>>
>> When we're running in MSR_HV=0 mode on a 970 we can use the 32 byte  
>> dcbz HID flag. So all we need to do is tell our entry/exit code to  
>> set this bit.
>
> Which patch contains that entry/exit code?

That's patch 7 / 27.

+	/* Some guests may need to have dcbz set to 32 byte length.
+	 *
+	 * Usually we ensure that by patching the guest's instructions
+	 * to trap on dcbz and emulate it in the hypervisor.
+	 *
+	 * If we can, we should tell the CPU to use 32 byte dcbz though,
+	 * because that's a lot faster.
+	 */
+
+	ld	r3, VCPU_HFLAGS(r4)
+	rldicl.	r3, r3, 0, 63		/* CR = ((r3 & 1) = 0) */
+	beq	no_dcbz32_on
+
+	mfspr   r3,SPRN_HID5
+	ori     r3, r3, 0x80		/* XXX HID5_dcbz32 = 0x80 */
+	mtspr   SPRN_HID5,r3
+
+no_dcbz32_on:

>> If we're on 970 on a hypervisor or on a non-970 though we can't use  
>> the HID5 bit, so we need to binary patch the opcodes.
>>
>> So in order to emulate real 970 behavior, we need to be able to  
>> emulate that HID5 bit too! That's what this chunk of code does - it  
>> basically sets us in dcbz32 mode when allowed on 970 guests.
>
> But when MSR[HV]=0 and MSR[PR]=0, mtspr to a hypervisor resource
> will not trap but be silently ignored.  Sorry for not being more  
> clear.
> ...Oh.  You run your guest as MSR[PR]=1 anyway!  Tricky.

Yeah, the guest is always running in PR=1, so all HV checks are for  
the host. Usually we run in HV=1 on the host, because IBM doesn't sell  
machines that have HV=0 accessible for mortals :-).


I'll address your comments in a follow-up patch once the stuff is  
merged.

Alex


^ permalink raw reply	[flat|nested] 244+ messages in thread

end of thread, other threads:[~2009-11-05 10:09 UTC | newest]

Thread overview: 244+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-29  8:17 [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4 Alexander Graf
2009-09-30  8:42 ` Avi Kivity
2009-09-30  8:47 ` Alexander Graf
2009-09-30  8:59 ` Avi Kivity
2009-09-30  9:11 ` Alexander Graf
2009-09-30  9:24 ` Avi Kivity
2009-09-30  9:37 ` Alexander Graf
2009-10-02  0:26 ` Benjamin Herrenschmidt
2009-10-02  0:32 ` Benjamin Herrenschmidt
2009-10-03 10:08 ` Avi Kivity
2009-10-03 10:58 ` Benjamin Herrenschmidt
2009-10-03 11:10 ` Benjamin Herrenschmidt
2009-10-21 15:03 [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5 Alexander Graf
2009-10-21 15:03 ` Alexander Graf
2009-10-21 15:03 ` [PATCH 01/27] Move dirty logging code to sub-arch Alexander Graf
2009-10-21 15:03   ` Alexander Graf
     [not found]   ` <1256137413-15256-2-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-10-21 15:03     ` [PATCH 02/27] Pass PVR in sregs Alexander Graf
2009-10-21 15:03       ` Alexander Graf
2009-10-21 15:03       ` [PATCH 03/27] Add Book3s definitions Alexander Graf
2009-10-21 15:03         ` Alexander Graf
2009-10-21 15:03         ` [PATCH 04/27] Add Book3s fields to vcpu structs Alexander Graf
2009-10-21 15:03           ` Alexander Graf
2009-10-21 15:03           ` [PATCH 05/27] Add asm/kvm_book3s.h Alexander Graf
2009-10-21 15:03             ` Alexander Graf
2009-10-21 15:03             ` [PATCH 06/27] Add Book3s_64 intercept helpers Alexander Graf
2009-10-21 15:03               ` Alexander Graf
2009-10-21 15:03               ` [PATCH 07/27] Add book3s_64 highmem asm code Alexander Graf
2009-10-21 15:03                 ` Alexander Graf
     [not found]                 ` <1256137413-15256-8-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-10-21 15:03                   ` [PATCH 08/27] Add SLB switching code for entry/exit Alexander Graf
2009-10-21 15:03                     ` Alexander Graf
     [not found]                     ` <1256137413-15256-9-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-10-21 15:03                       ` [PATCH 09/27] Add interrupt handling code Alexander Graf
2009-10-21 15:03                         ` Alexander Graf
     [not found]                         ` <1256137413-15256-10-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-10-21 15:03                           ` [PATCH 10/27] Add book3s.c Alexander Graf
2009-10-21 15:03                             ` Alexander Graf
2009-10-21 15:03                             ` [PATCH 11/27] Add book3s_64 Host MMU handling Alexander Graf
2009-10-21 15:03                               ` Alexander Graf
2009-10-21 15:03                               ` [PATCH 12/27] Add book3s_64 guest MMU Alexander Graf
2009-10-21 15:03                                 ` Alexander Graf
     [not found]                                 ` <1256137413-15256-13-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-10-21 15:03                                   ` [PATCH 13/27] Add book3s_32 " Alexander Graf
2009-10-21 15:03                                     ` Alexander Graf
     [not found]                                     ` <1256137413-15256-14-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-10-21 15:03                                       ` [PATCH 14/27] Add book3s_64 specific opcode emulation Alexander Graf
2009-10-21 15:03                                         ` Alexander Graf
     [not found]                                         ` <1256137413-15256-15-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-10-21 15:03                                           ` [PATCH 15/27] Add mfdec emulation Alexander Graf
2009-10-21 15:03                                             ` Alexander Graf
     [not found]                                             ` <1256137413-15256-16-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-10-21 15:03                                               ` [PATCH 16/27] Add desktop PowerPC specific emulation Alexander Graf
2009-10-21 15:03                                                 ` Alexander Graf
     [not found]                                                 ` <1256137413-15256-17-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-10-21 15:03                                                   ` [PATCH 17/27] Make head_64.S aware of KVM real mode code Alexander Graf
2009-10-21 15:03                                                     ` Alexander Graf
2009-10-21 15:03                                                     ` [PATCH 18/27] Add Book3s_64 offsets to asm-offsets.c Alexander Graf
2009-10-21 15:03                                                       ` Alexander Graf
     [not found]                                                       ` <1256137413-15256-19-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-10-21 15:03                                                         ` [PATCH 19/27] Export symbols for KVM module Alexander Graf
2009-10-21 15:03                                                           ` Alexander Graf
     [not found]                                                           ` <1256137413-15256-20-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-10-21 15:03                                                             ` [PATCH 20/27] Split init_new_context and destroy_context Alexander Graf
2009-10-21 15:03                                                               ` Alexander Graf
     [not found]                                                               ` <1256137413-15256-21-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-10-21 15:03                                                                 ` [PATCH 21/27] Export KVM symbols for module Alexander Graf
2009-10-21 15:03                                                                   ` Alexander Graf
2009-10-21 15:03                                                                   ` [PATCH 22/27] Add fields to PACA Alexander Graf
2009-10-21 15:03                                                                     ` Alexander Graf
     [not found]                                                                     ` <1256137413-15256-23-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-10-21 15:03                                                                       ` [PATCH 23/27] Export new PACA constants in asm-offsets Alexander Graf
2009-10-21 15:03                                                                         ` Alexander Graf
2009-10-21 15:03                                                                         ` [PATCH 24/27] Include Book3s_64 target in buildsystem Alexander Graf
2009-10-21 15:03                                                                           ` Alexander Graf
     [not found]                                                                           ` <1256137413-15256-25-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-10-21 15:03                                                                             ` [PATCH 25/27] Fix trace.h Alexander Graf
2009-10-21 15:03                                                                               ` Alexander Graf
2009-10-21 15:03                                                                               ` [PATCH 26/27] Use Little Endian for Dirty Bitmap Alexander Graf
2009-10-21 15:03                                                                                 ` Alexander Graf
     [not found]                                                                                 ` <1256137413-15256-27-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-10-21 15:03                                                                                   ` [PATCH 27/27] Use hrtimers for the decrementer Alexander Graf
2009-10-21 15:03                                                                                     ` Alexander Graf
     [not found]                                                                         ` <1256137413-15256-24-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-10-29  2:50                                                                           ` [PATCH 23/27] Export new PACA constants in asm-offsets Benjamin Herrenschmidt
2009-10-29  2:50                                                                             ` Benjamin Herrenschmidt
2009-10-29  2:50                                                                       ` [PATCH 22/27] Add fields to PACA Benjamin Herrenschmidt
2009-10-29  2:50                                                                         ` Benjamin Herrenschmidt
2009-10-29  2:48                                                                 ` [PATCH 20/27] Split init_new_context and destroy_context Benjamin Herrenschmidt
2009-10-29  2:48                                                                   ` Benjamin Herrenschmidt
2009-10-29  2:46                                                             ` [PATCH 19/27] Export symbols for KVM module Benjamin Herrenschmidt
2009-10-29  2:46                                                               ` Benjamin Herrenschmidt
2009-10-29  2:53                                                               ` Alexander Graf
2009-10-29  2:53                                                                 ` Alexander Graf
2009-10-29  2:45                                                         ` [PATCH 18/27] Add Book3s_64 offsets to asm-offsets.c Benjamin Herrenschmidt
2009-10-29  2:45                                                           ` Benjamin Herrenschmidt
     [not found]                                                     ` <1256137413-15256-18-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-10-29  2:45                                                       ` [PATCH 17/27] Make head_64.S aware of KVM real mode code Benjamin Herrenschmidt
2009-10-29  2:45                                                         ` Benjamin Herrenschmidt
2009-10-21 15:22 ` [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5 Alexander Graf
2009-10-21 15:22   ` Alexander Graf
     [not found] ` <1256137413-15256-1-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-10-22 13:26   ` Arnd Bergmann
2009-10-22 13:26     ` Arnd Bergmann
2009-10-23  0:33   ` Hollis Blanchard
2009-10-23  0:33     ` Hollis Blanchard
     [not found]     ` <1256258028.7495.34.camel-6XWu2dSDoRTcKpUcGLbliUEOCMrvLtNR@public.gmane.org>
2009-10-25 13:01       ` Avi Kivity
2009-10-25 13:01         ` Avi Kivity
     [not found]         ` <4AE44C14.8040507-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-10-26 21:18           ` Hollis Blanchard
2009-10-26 21:18             ` Hollis Blanchard
2009-10-29  2:55           ` Benjamin Herrenschmidt
2009-10-29  2:55             ` Benjamin Herrenschmidt
2009-10-26 22:46 ` Olof Johansson
2009-10-26 23:06   ` Olof Johansson
     [not found]   ` <20091026230632.GB5366-nZhT3qVonbNeoWH0uzbU5w@public.gmane.org>
2009-10-26 23:20     ` Hollis Blanchard
2009-10-26 23:20       ` Hollis Blanchard
2009-10-26 23:21       ` Olof Johansson
2009-10-26 23:21         ` Olof Johansson
2009-10-27  8:56         ` Avi Kivity
2009-10-27  8:56           ` Avi Kivity
2009-10-27 13:42           ` Alexander Graf
2009-10-27 13:42             ` Alexander Graf
     [not found]             ` <8E92E3B9-39D5-4D71-8B8E-96B49430B67B-l3A5Bk7waGM@public.gmane.org>
2009-10-27 15:49               ` Avi Kivity
2009-10-27 15:49                 ` Avi Kivity
2009-10-30 15:47 [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v6 Alexander Graf
2009-10-30 15:47 ` Alexander Graf
2009-10-30 15:47 ` Alexander Graf
2009-10-30 15:47 ` [PATCH 02/27] Pass PVR in sregs Alexander Graf
2009-10-30 15:47   ` Alexander Graf
2009-10-30 15:47   ` Alexander Graf
2009-10-30 15:47 ` [PATCH 04/27] Add Book3s fields to vcpu structs Alexander Graf
2009-10-30 15:47   ` Alexander Graf
2009-10-30 15:47   ` Alexander Graf
2009-10-30 15:47 ` [PATCH 05/27] Add asm/kvm_book3s.h Alexander Graf
2009-10-30 15:47   ` Alexander Graf
2009-10-30 15:47   ` Alexander Graf
2009-10-30 15:47 ` [PATCH 06/27] Add Book3s_64 intercept helpers Alexander Graf
2009-10-30 15:47   ` Alexander Graf
2009-10-30 15:47   ` Alexander Graf
2009-10-30 15:47 ` [PATCH 11/27] Add book3s_64 Host MMU handling Alexander Graf
2009-10-30 15:47   ` Alexander Graf
2009-10-30 15:47   ` Alexander Graf
     [not found]   ` <1256917647-6200-12-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-11-01 23:39     ` Michael Neuling
2009-11-01 23:39       ` Michael Neuling
2009-11-01 23:39       ` Michael Neuling
2009-11-02  9:26       ` Alexander Graf
2009-11-02  9:26         ` Alexander Graf
2009-11-02  9:26         ` Alexander Graf
2009-10-30 15:47 ` [PATCH 17/27] Make head_64.S aware of KVM real mode code Alexander Graf
2009-10-30 15:47   ` Alexander Graf
2009-10-30 15:47   ` Alexander Graf
2009-10-30 15:47 ` [PATCH 18/27] Add Book3s_64 offsets to asm-offsets.c Alexander Graf
2009-10-30 15:47   ` Alexander Graf
2009-10-30 15:47   ` Alexander Graf
     [not found] ` <1256917647-6200-1-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-10-30 15:47   ` [PATCH 01/27] Move dirty logging code to sub-arch Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47   ` [PATCH 03/27] Add Book3s definitions Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47   ` [PATCH 07/27] Add book3s_64 highmem asm code Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47   ` [PATCH 08/27] Add SLB switching code for entry/exit Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-11-01 23:23     ` Michael Neuling
2009-11-01 23:23       ` Michael Neuling
2009-11-01 23:23       ` Michael Neuling
     [not found]       ` <6695.1257117827-/owAOxkjmzZAfugRpC6u6w@public.gmane.org>
2009-11-02  9:23         ` Alexander Graf
2009-11-02  9:23           ` Alexander Graf
2009-11-02  9:23           ` Alexander Graf
     [not found]           ` <00BF2D99-F2CE-4204-B4B4-0D113FD54CE6-l3A5Bk7waGM@public.gmane.org>
2009-11-02  9:39             ` Michael Neuling
2009-11-02  9:39               ` Michael Neuling
2009-11-02  9:39               ` Michael Neuling
2009-11-02  9:59               ` Alexander Graf
2009-11-02  9:59                 ` Alexander Graf
2009-11-02  9:59                 ` Alexander Graf
2009-10-30 15:47   ` [PATCH 09/27] Add interrupt handling code Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47   ` [PATCH 10/27] Add book3s.c Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47   ` [PATCH 12/27] Add book3s_64 guest MMU Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47   ` [PATCH 13/27] Add book3s_32 " Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47   ` [PATCH 14/27] Add book3s_64 specific opcode emulation Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-11-03  8:47     ` Segher Boessenkool
2009-11-03  8:47       ` Segher Boessenkool
     [not found]       ` <A1CBD511-FF08-48BB-A8D6-9F66E20F770B-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
2009-11-03  9:06         ` Alexander Graf
2009-11-03  9:06           ` Alexander Graf
2009-11-03  9:06           ` Alexander Graf
2009-11-03 21:38           ` Benjamin Herrenschmidt
2009-11-03 21:38             ` Benjamin Herrenschmidt
2009-11-04  8:43             ` Arnd Bergmann
2009-11-04  8:43               ` Arnd Bergmann
2009-11-04  8:47               ` Benjamin Herrenschmidt
2009-11-04  8:47                 ` Benjamin Herrenschmidt
2009-11-04 11:35                 ` Alexander Graf
2009-11-04 11:35                   ` Alexander Graf
2009-11-04 11:35                   ` Alexander Graf
2009-11-05  0:53           ` Segher Boessenkool
2009-11-05  0:53             ` Segher Boessenkool
2009-11-05 10:09             ` Alexander Graf
2009-11-05 10:09               ` Alexander Graf
2009-11-05 10:09               ` Alexander Graf
2009-10-30 15:47   ` [PATCH 15/27] Add mfdec emulation Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47   ` [PATCH 16/27] Add desktop PowerPC specific emulation Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47   ` [PATCH 19/27] Export symbols for KVM module Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47     ` Alexander Graf
     [not found]     ` <1256917647-6200-20-git-send-email-agraf-l3A5Bk7waGM@public.gmane.org>
2009-10-31  4:37       ` Stephen Rothwell
2009-10-31  4:37         ` Stephen Rothwell
2009-10-31  4:37         ` Stephen Rothwell
     [not found]         ` <20091031153719.10a4e61b.sfr-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org>
2009-10-31 12:02           ` Alexander Graf
2009-10-31 12:02             ` Alexander Graf
2009-10-31 12:02             ` Alexander Graf
2009-10-30 15:47   ` [PATCH 20/27] Split init_new_context and destroy_context Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-31  4:40     ` Stephen Rothwell
2009-10-31  4:40       ` Stephen Rothwell
2009-10-31  4:40       ` Stephen Rothwell
2009-10-31 21:20       ` Alexander Graf
2009-10-31 21:20         ` Alexander Graf
2009-10-31 21:20         ` Alexander Graf
2009-10-31 21:37         ` Benjamin Herrenschmidt
2009-10-31 21:37           ` Benjamin Herrenschmidt
2009-10-30 15:47   ` [PATCH 21/27] Export KVM symbols for module Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47   ` [PATCH 22/27] Add fields to PACA Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47   ` [PATCH 27/27] Use hrtimers for the decrementer Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-10-30 15:47     ` Alexander Graf
2009-11-05  6:03   ` [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v6 Benjamin Herrenschmidt
2009-11-05  6:03     ` Benjamin Herrenschmidt
2009-11-05  6:03     ` Benjamin Herrenschmidt
2009-10-30 15:47 ` [PATCH 23/27] Export new PACA constants in asm-offsets Alexander Graf
2009-10-30 15:47   ` Alexander Graf
2009-10-30 15:47   ` Alexander Graf
2009-10-30 15:47 ` [PATCH 24/27] Include Book3s_64 target in buildsystem Alexander Graf
2009-10-30 15:47   ` Alexander Graf
2009-10-30 15:47   ` Alexander Graf
2009-10-30 15:47 ` [PATCH 25/27] Fix trace.h Alexander Graf
2009-10-30 15:47   ` Alexander Graf
2009-10-30 15:47   ` Alexander Graf
2009-10-30 15:47 ` [PATCH 26/27] Use Little Endian for Dirty Bitmap Alexander Graf
2009-10-30 15:47   ` Alexander Graf
2009-10-30 15:47   ` Alexander Graf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.