Re: kvm stage 2 mapping logic

From: Janne Karhunen <janne.karhunen@gmail.com>
To: Marc Zyngier <maz@kernel.org>
Cc: kvmarm@lists.cs.columbia.edu
Subject: Re: kvm stage 2 mapping logic
Date: Tue, 31 Mar 2020 16:14:06 +0300	[thread overview]
Message-ID: <CAE=NcrYVdfuxcojLqqUe_ienYd6okdJX2wq9eHmazs1V-6QBeg@mail.gmail.com> (raw)
In-Reply-To: <20200331133609.08e89abc@why>

On Tue, Mar 31, 2020 at 3:36 PM Marc Zyngier <maz@kernel.org> wrote:

> > I'm experimenting with the kvm in order to see how it would work in
> > co-existence with a tiny external hypervisor that also runs the host
> > in el1/vmid 0.
>
> Popular theme these days...

Indeed. In my dream world I would land with a config where the host
and the guest have very minimal understanding of each other. How small
that can be remains to be seen..

> > More about this later on in case it turns out to be
> > anything generally useful, but I've been stuck for a few days now
> > understanding the kvm stage-2 (ipa-to-phys) mapping when the guest is
> > being created. Things I think I've understood so far;
> >
> > - qemu mmaps the guest memory per the machine type (virt in my case)
> > - qemu pushes the machine physical memory model in the kernel through
> > the kvm_vm_ioctl_set_memory_region()
> > - kvm has mmu notifier block set to listen to the changes to these
> > regions and it becomes active after the machine memory model arrives.
> > The mmu notifier calls handle_hva_to_gpa() that dispatches the call to
> > the appropriate map or unmap handler and these do the s2 mapping
> > changes for the vm as needed
>
> Note that these MMU notifiers only make sense when something happens on
> the host: attribute change (for page aging, for example) or unmap
> (e.g. page being swapped out).

Yes. This fooled me for a while as I was thinking it actually does the
job, but no. It was my second miss, first place I was looking at was
the ioctl call itself.

> > - prior to starting the vm, kvm_arch_prepare_memory_region() is given
> > a try to see if any IO areas could be s2 mapped before the host is
> > allowed to execute. This is mostly an optimization?
>
> Yes, and not necessarily a useful one. I think I have a patch to drop
> that.

Ack.

> > - vcpu is started
> > - as the pages are touched when the vcpu starts executing, page faults
> > get generated and the real s2 mappings slowly start to get created.
> > LRU keeps the active pages pinned in memory, others will get evicted
> > and their s2 mapping eventually disappears
> > - all in all, the vm runs and behaves pretty much like a normal
> > userspace process
>
> Indeed, just with a different set of page tables.

Awesome. Took a while to understand the construction.

> > Is this roughly the story? If it is, I'm a bit lost where the stage2
> > page fault handler that is supposed to generate the s2 mappings is.
>
> user_mem_abort() is your friend (or not, it's a very nasty piece of
> code). If you trace the fault handling path all the way from the EL2
> vectors, you will eventually get there.

THANK YOU! My missing piece.

> > It
> > was surprisingly easy to get the external hypervisor (with very
> > minimal changes to the kvm) to the point when the guest is being
> > entered and the vmid 1 starts to refer to the instructions at the vm
> > ram base (0x4000 0000 for virt). Those, of course, currently scream
> > bloody murder as the s2 mapping does not exist.
>
> Well, *something* must be handling the fault, right? But if you've
> wrapped the host with its own S2 page tables, it may not be able to
> populate the S2 pages for the guest (wild guess...).

Let's see. Yes, the 'virt' host is already nicely wrapped and running.
To make this easier for starters I use 1:1 ipa:phys mapping to get one
vm (besides the host) going.

--
Janne
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm