Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
@ 2020-10-11  5:26 harry harry
  2020-10-11  7:29 ` Maxim Levitsky
  0 siblings, 1 reply; 16+ messages in thread
From: harry harry @ 2020-10-11  5:26 UTC (permalink / raw)
  To: qemu-devel, mathieu.tarral, stefanha, libvir-list, kvm, pbonzini

Hi QEMU/KVM developers,

I am sorry if my email disturbs you. I did an experiment and found the
guest physical addresses (GPAs) are not the same as the corresponding
host virtual addresses (HVAs). I am curious about why; I think they
should be the same. I am very appreciated if you can give some
comments and suggestions about 1) why GPAs and HVAs are not the same
in the following experiment; 2) are there any better experiments to
look into the reasons? Any other comments/suggestions are also very
welcome. Thanks!

The experiment is like this: in a single vCPU VM, I ran a program
allocating and referencing lots of pages (e.g., 100*1024) and didn't
let the program terminate. Then, I checked the program's guest virtual
addresses (GVAs) and GPAs through parsing its pagemap and maps files
located at /proc/pid/pagemap and /proc/pid/maps, respectively. At
last, in the host OS, I checked the vCPU's pagemap and maps files to
find the program's HVAs and host physical addresses (HPAs); I actually
checked the new allocated physical pages in the host OS after the
program was executed in the guest OS.

With the above experiment, I found GPAs of the program are different
from its corresponding HVAs. BTW, Intel EPT and other related Intel
virtualization techniques were enabled.

Thanks,
Harry

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
  2020-10-11  5:26 Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks! harry harry
@ 2020-10-11  7:29 ` Maxim Levitsky
  2020-10-11 14:11   ` harry harry
  0 siblings, 1 reply; 16+ messages in thread
From: Maxim Levitsky @ 2020-10-11  7:29 UTC (permalink / raw)
  To: harry harry, qemu-devel, mathieu.tarral, stefanha, libvir-list,
	kvm, pbonzini

On Sun, 2020-10-11 at 01:26 -0400, harry harry wrote:
> Hi QEMU/KVM developers,
> 
> I am sorry if my email disturbs you. I did an experiment and found the
> guest physical addresses (GPAs) are not the same as the corresponding
> host virtual addresses (HVAs). I am curious about why; I think they
> should be the same. I am very appreciated if you can give some
> comments and suggestions about 1) why GPAs and HVAs are not the same
> in the following experiment; 2) are there any better experiments to
> look into the reasons? Any other comments/suggestions are also very
> welcome. Thanks!
> 
> The experiment is like this: in a single vCPU VM, I ran a program
> allocating and referencing lots of pages (e.g., 100*1024) and didn't
> let the program terminate. Then, I checked the program's guest virtual
> addresses (GVAs) and GPAs through parsing its pagemap and maps files
> located at /proc/pid/pagemap and /proc/pid/maps, respectively. At
> last, in the host OS, I checked the vCPU's pagemap and maps files to
> find the program's HVAs and host physical addresses (HPAs); I actually
> checked the new allocated physical pages in the host OS after the
> program was executed in the guest OS.
> 
> With the above experiment, I found GPAs of the program are different
> from its corresponding HVAs. BTW, Intel EPT and other related Intel
> virtualization techniques were enabled.
> 
> Thanks,
> Harry
> 
The fundemental reason is that some HVAs (e.g. QEMU's virtual memory addresses) are already allocated
for qemu's own use (e.g qemu code/heap/etc) prior to the guest starting up. 

KVM does though use quite effiecient way of mapping HVA's to GPA. It uses an array of arbitrary sized HVA areas
(which we call memslots) and for each such area/memslot you specify the GPA to map to. In theory QEMU 
could allocate the whole guest's memory in one contiguous area and map it as single memslot to the guest. 
In practice there are MMIO holes, and various other reasons why there will be more that 1 memslot.
 
Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
  2020-10-11  7:29 ` Maxim Levitsky
@ 2020-10-11 14:11   ` harry harry
  2020-10-12 16:54     ` Sean Christopherson
  0 siblings, 1 reply; 16+ messages in thread
From: harry harry @ 2020-10-11 14:11 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: qemu-devel, mathieu.tarral, stefanha, libvir-list, kvm, pbonzini

Hi Maxim,

Thanks much for your reply.

On Sun, Oct 11, 2020 at 3:29 AM Maxim Levitsky <mlevitsk@redhat.com> wrote:
>
> On Sun, 2020-10-11 at 01:26 -0400, harry harry wrote:
> > Hi QEMU/KVM developers,
> >
> > I am sorry if my email disturbs you. I did an experiment and found the
> > guest physical addresses (GPAs) are not the same as the corresponding
> > host virtual addresses (HVAs). I am curious about why; I think they
> > should be the same. I am very appreciated if you can give some
> > comments and suggestions about 1) why GPAs and HVAs are not the same
> > in the following experiment; 2) are there any better experiments to
> > look into the reasons? Any other comments/suggestions are also very
> > welcome. Thanks!
> >
> > The experiment is like this: in a single vCPU VM, I ran a program
> > allocating and referencing lots of pages (e.g., 100*1024) and didn't
> > let the program terminate. Then, I checked the program's guest virtual
> > addresses (GVAs) and GPAs through parsing its pagemap and maps files
> > located at /proc/pid/pagemap and /proc/pid/maps, respectively. At
> > last, in the host OS, I checked the vCPU's pagemap and maps files to
> > find the program's HVAs and host physical addresses (HPAs); I actually
> > checked the new allocated physical pages in the host OS after the
> > program was executed in the guest OS.
> >
> > With the above experiment, I found GPAs of the program are different
> > from its corresponding HVAs. BTW, Intel EPT and other related Intel
> > virtualization techniques were enabled.
> >
> > Thanks,
> > Harry
> >
> The fundemental reason is that some HVAs (e.g. QEMU's virtual memory addresses) are already allocated
> for qemu's own use (e.g qemu code/heap/etc) prior to the guest starting up.
>
> KVM does though use quite effiecient way of mapping HVA's to GPA. It uses an array of arbitrary sized HVA areas
> (which we call memslots) and for each such area/memslot you specify the GPA to map to. In theory QEMU
> could allocate the whole guest's memory in one contiguous area and map it as single memslot to the guest.
> In practice there are MMIO holes, and various other reasons why there will be more that 1 memslot.

It is still not clear to me why GPAs are not the same as the
corresponding HVAs in my experiment. Since two-dimensional paging
(Intel EPT) is used, GPAs should be the same as their corresponding
HVAs. Otherwise, I think EPT may not work correctly. What do you
think?

Thanks,
Harry

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
  2020-10-11 14:11   ` harry harry
@ 2020-10-12 16:54     ` Sean Christopherson
  2020-10-13  4:30       ` harry harry
  2020-10-13  5:00       ` harry harry
  0 siblings, 2 replies; 16+ messages in thread
From: Sean Christopherson @ 2020-10-12 16:54 UTC (permalink / raw)
  To: harry harry
  Cc: Maxim Levitsky, qemu-devel, mathieu.tarral, stefanha,
	libvir-list, kvm, pbonzini

On Sun, Oct 11, 2020 at 10:11:39AM -0400, harry harry wrote:
> Hi Maxim,
> 
> Thanks much for your reply.
> 
> On Sun, Oct 11, 2020 at 3:29 AM Maxim Levitsky <mlevitsk@redhat.com> wrote:
> >
> > On Sun, 2020-10-11 at 01:26 -0400, harry harry wrote:
> > > Hi QEMU/KVM developers,
> > >
> > > I am sorry if my email disturbs you. I did an experiment and found the
> > > guest physical addresses (GPAs) are not the same as the corresponding
> > > host virtual addresses (HVAs). I am curious about why; I think they
> > > should be the same. I am very appreciated if you can give some
> > > comments and suggestions about 1) why GPAs and HVAs are not the same
> > > in the following experiment; 2) are there any better experiments to
> > > look into the reasons? Any other comments/suggestions are also very
> > > welcome. Thanks!
> > >
> > > The experiment is like this: in a single vCPU VM, I ran a program
> > > allocating and referencing lots of pages (e.g., 100*1024) and didn't
> > > let the program terminate. Then, I checked the program's guest virtual
> > > addresses (GVAs) and GPAs through parsing its pagemap and maps files
> > > located at /proc/pid/pagemap and /proc/pid/maps, respectively. At
> > > last, in the host OS, I checked the vCPU's pagemap and maps files to
> > > find the program's HVAs and host physical addresses (HPAs); I actually
> > > checked the new allocated physical pages in the host OS after the
> > > program was executed in the guest OS.
> > >
> > > With the above experiment, I found GPAs of the program are different
> > > from its corresponding HVAs. BTW, Intel EPT and other related Intel
> > > virtualization techniques were enabled.
> > >
> > > Thanks,
> > > Harry
> > >
> > The fundemental reason is that some HVAs (e.g. QEMU's virtual memory addresses) are already allocated
> > for qemu's own use (e.g qemu code/heap/etc) prior to the guest starting up.
> >
> > KVM does though use quite effiecient way of mapping HVA's to GPA. It uses an array of arbitrary sized HVA areas
> > (which we call memslots) and for each such area/memslot you specify the GPA to map to. In theory QEMU
> > could allocate the whole guest's memory in one contiguous area and map it as single memslot to the guest.
> > In practice there are MMIO holes, and various other reasons why there will be more that 1 memslot.
> 
> It is still not clear to me why GPAs are not the same as the
> corresponding HVAs in my experiment. Since two-dimensional paging
> (Intel EPT) is used, GPAs should be the same as their corresponding
> HVAs. Otherwise, I think EPT may not work correctly. What do you
> think?

No, the guest physical address spaces is not intrinsically tied to the host
virtual address spaces.  The fact that GPAs and HVAs are related in KVM is a
property KVM's architecture.  EPT/NPT has absolutely nothing to do with HVAs.

As Maxim pointed out, KVM links a guest's physical address space, i.e. GPAs, to
the host's virtual address space, i.e. HVAs, via memslots.  For all intents and
purposes, this is an extra layer of address translation that is purely software
defined.  The memslots allow KVM to retrieve the HPA for a given GPA when
servicing a shadow page fault (a.k.a. EPT violation).

When EPT is enabled, a shadow page fault due to an unmapped GPA will look like:

 GVA -> [guest page tables] -> GPA -> EPT Violation VM-Exit

The above walk of the guest page tables is done in hardware.  KVM then does the
following walks in software to retrieve the desired HPA:

 GPA -> [memslots] -> HVA -> [host page tables] -> HPA

KVM then takes the resulting HPA and shoves it into KVM's shadow page tables,
or when TDP is enabled, the EPT/NPT page tables.  When the guest is run with
TDP enabled, GVA->HPA translations look like the following, with all walks done
in hardware.

 GVA -> [guest page tables] -> GPA -> [extended/nested page tables] -> HPA

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
  2020-10-12 16:54     ` Sean Christopherson
@ 2020-10-13  4:30       ` harry harry
  2020-10-13  4:52         ` Sean Christopherson
  2020-10-13  5:00       ` harry harry
  1 sibling, 1 reply; 16+ messages in thread
From: harry harry @ 2020-10-13  4:30 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Maxim Levitsky, qemu-devel, mathieu.tarral, stefanha,
	libvir-list, kvm, pbonzini

Hi Sean,

Thank you very much for your thorough explanations. Please see my
inline replies as follows. Thanks!

On Mon, Oct 12, 2020 at 12:54 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> No, the guest physical address spaces is not intrinsically tied to the host
> virtual address spaces.  The fact that GPAs and HVAs are related in KVM is a
> property KVM's architecture.  EPT/NPT has absolutely nothing to do with HVAs.
>
> As Maxim pointed out, KVM links a guest's physical address space, i.e. GPAs, to
> the host's virtual address space, i.e. HVAs, via memslots.  For all intents and
> purposes, this is an extra layer of address translation that is purely software
> defined.  The memslots allow KVM to retrieve the HPA for a given GPA when
> servicing a shadow page fault (a.k.a. EPT violation).
>
> When EPT is enabled, a shadow page fault due to an unmapped GPA will look like:
>
>  GVA -> [guest page tables] -> GPA -> EPT Violation VM-Exit
>
> The above walk of the guest page tables is done in hardware.  KVM then does the
> following walks in software to retrieve the desired HPA:
>
>  GPA -> [memslots] -> HVA -> [host page tables] -> HPA

Do you mean that GPAs are different from their corresponding HVAs when
KVM does the walks (as you said above) in software?

Thanks,
Harry

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
  2020-10-13  4:30       ` harry harry
@ 2020-10-13  4:52         ` Sean Christopherson
       [not found]           ` <CA+-xGqO4DtUs3-jH+QMPEze2GrXwtNX0z=vVUVak5HOpPKaDxQ@mail.gmail.com>
  0 siblings, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2020-10-13  4:52 UTC (permalink / raw)
  To: harry harry
  Cc: Maxim Levitsky, qemu-devel, mathieu.tarral, stefanha,
	libvir-list, kvm, pbonzini

On Tue, Oct 13, 2020 at 12:30:39AM -0400, harry harry wrote:
> Hi Sean,
> 
> Thank you very much for your thorough explanations. Please see my
> inline replies as follows. Thanks!
> 
> On Mon, Oct 12, 2020 at 12:54 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > No, the guest physical address spaces is not intrinsically tied to the host
> > virtual address spaces.  The fact that GPAs and HVAs are related in KVM is a
> > property KVM's architecture.  EPT/NPT has absolutely nothing to do with HVAs.
> >
> > As Maxim pointed out, KVM links a guest's physical address space, i.e. GPAs, to
> > the host's virtual address space, i.e. HVAs, via memslots.  For all intents and
> > purposes, this is an extra layer of address translation that is purely software
> > defined.  The memslots allow KVM to retrieve the HPA for a given GPA when
> > servicing a shadow page fault (a.k.a. EPT violation).
> >
> > When EPT is enabled, a shadow page fault due to an unmapped GPA will look like:
> >
> >  GVA -> [guest page tables] -> GPA -> EPT Violation VM-Exit
> >
> > The above walk of the guest page tables is done in hardware.  KVM then does the
> > following walks in software to retrieve the desired HPA:
> >
> >  GPA -> [memslots] -> HVA -> [host page tables] -> HPA
> 
> Do you mean that GPAs are different from their corresponding HVAs when
> KVM does the walks (as you said above) in software?

What do you mean by "different"?  GPAs and HVAs are two completely different
address spaces.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
  2020-10-12 16:54     ` Sean Christopherson
  2020-10-13  4:30       ` harry harry
@ 2020-10-13  5:00       ` harry harry
  1 sibling, 0 replies; 16+ messages in thread
From: harry harry @ 2020-10-13  5:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Maxim Levitsky, qemu-devel, mathieu.tarral, stefanha,
	libvir-list, kvm, pbonzini

BTW, I still have one more question as follows. Thanks!

On Mon, Oct 12, 2020 at 12:54 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> No, the guest physical address spaces is not intrinsically tied to the host
> virtual address spaces.  The fact that GPAs and HVAs are related in KVM is a
> property KVM's architecture.  EPT/NPT has absolutely nothing to do with HVAs.
>
> As Maxim pointed out, KVM links a guest's physical address space, i.e. GPAs, to
> the host's virtual address space, i.e. HVAs, via memslots.  For all intents and
> purposes, this is an extra layer of address translation that is purely software
> defined.  The memslots allow KVM to retrieve the HPA for a given GPA when
> servicing a shadow page fault (a.k.a. EPT violation).
>
> When EPT is enabled, a shadow page fault due to an unmapped GPA will look like:
>
>  GVA -> [guest page tables] -> GPA -> EPT Violation VM-Exit
>
> The above walk of the guest page tables is done in hardware.  KVM then does the
> following walks in software to retrieve the desired HPA:
>
>  GPA -> [memslots] -> HVA -> [host page tables] -> HPA
>
> KVM then takes the resulting HPA and shoves it into KVM's shadow page tables,
> or when TDP is enabled, the EPT/NPT page tables.  When the guest is run with
> TDP enabled, GVA->HPA translations look like the following, with all walks done
> in hardware.
>
>  GVA -> [guest page tables] -> GPA -> [extended/nested page tables] -> HPA

If I understand correctly, the hardware logic of MMU to walk ``GPA ->
[extended/nested page tables] -> HPA''[1] should be the same as ``HVA
-> [host page tables] -> HPA"[2]. If not true, how does KVM find the
correct HPAs when there are EPT violations?

[1] Please note that this hardware walk is the last step, which only
translates the guest physical address to the host physical address
through the four-level nested page table.
[2] Please note that this hardware walk assumes translating the HVA to
the HPA without virtualization involvement.

Thanks,
Harry

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
       [not found]             ` <CA+-xGqMMa-DB1SND5MRugusDafjNA9CVw-=OBK7q=CK1impmTQ@mail.gmail.com>
@ 2020-10-13  6:43               ` Paolo Bonzini
  2020-10-13 20:36                 ` harry harry
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2020-10-13  6:43 UTC (permalink / raw)
  To: harry harry, Sean Christopherson
  Cc: Maxim Levitsky, qemu-devel, mathieu.tarral, stefanha, libvir-list, kvm

On 13/10/20 07:46, harry harry wrote:
> Now, let's assume array[0]'s GPA is different from its corresponding
> HVA. I think there might be one issue like this: I think MMU's hardware
> logic to translate ``GPA ->[extended/nested page tables] -> HPA''[1]
> should be the same as ``VA-> [page tables] -> PA"[2]; if true, how does
> KVM find the correct HPA with the different HVA (e.g., array[0]'s HVA is
> not  0x0000000000000081) when there are EPT violations?

It has separate data structures that help with the translation.  These
data structures are specific to KVM for GPA to HVA translation, while
for HVA to HPA the Linux functionality is reused.

> BTW, I assume the software logic for KVM to find the HPA with a given
> HVA (as you said like below) should be the same as the hardware logic in
> MMU to translate ``GPA -> [extended/nested page tables] -> HPA''.

No, the logic to find the HPA with a given HVA is the same as the
hardware logic to translate HVA -> HPA.  That is it uses the host
"regular" page tables, not the nested page tables.

In order to translate GPA to HPA, instead, KVM does not use the nested
page tables.  It performs instead two steps, from GPA to HVA and from
HVA to HPA:

* for GPA to HVA it uses a custom data structure.

* for HVA to HPA it uses the host page tables as mentioned above.

This is because:

* the GPA to HVA translation is the one that is almost always
sufficient, and the nested page tables do not provide this information

* even if GPA to HPA is needed, the nested page tables are built lazily
and therefore may not always contain the requested mapping.  In addition
using HPA requires special steps (such as calling get_page/put_page) and
often these steps need an HVA anyway.

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
       [not found]           ` <CA+-xGqO4DtUs3-jH+QMPEze2GrXwtNX0z=vVUVak5HOpPKaDxQ@mail.gmail.com>
       [not found]             ` <CA+-xGqMMa-DB1SND5MRugusDafjNA9CVw-=OBK7q=CK1impmTQ@mail.gmail.com>
@ 2020-10-13  7:03             ` Sean Christopherson
  2020-10-13 22:40               ` harry harry
  1 sibling, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2020-10-13  7:03 UTC (permalink / raw)
  To: harry harry
  Cc: Maxim Levitsky, qemu-devel, mathieu.tarral, stefanha,
	libvir-list, kvm, pbonzini

On Tue, Oct 13, 2020 at 01:33:28AM -0400, harry harry wrote:
> > > Do you mean that GPAs are different from their corresponding HVAs when
> > > KVM does the walks (as you said above) in software?
> >
> > What do you mean by "different"?  GPAs and HVAs are two completely
> different
> > address spaces.
> 
> Let me give you one concrete example as follows to explain the meaning of
> ``different''.
> 
> Suppose a program is running in a single-vCPU VM. The program allocates and
> references one page (e.g., array[1024*4]). Assume that allocating and
> referencing the page in the guest OS triggers a page fault and host OS
> allocates a machine page to back it.
> 
> Assume that GVA of array[0] is 0x000000000021 and its corresponding GPA is
> 0x0000000000000081. I think array[0]'s corresponding HVA should also be
> 0x0000000000000081, which is the same as array[0]'s GPA. If array[0]'s HVA
> is not 0x0000000000000081, array[0]'s GPA is* different* from its
> corresponding HVA.
> 
> Now, let's assume array[0]'s GPA is different from its corresponding HVA. I
> think there might be one issue like this: I think MMU's hardware logic to
> translate ``GPA ->[extended/nested page tables] -> HPA''[1] should be the
> same as ``VA-> [page tables] -> PA"[2]; if true, how does KVM find the
> correct HPA with the different HVA (e.g., array[0]'s HVA is not
> 0x0000000000000081) when there are EPT violations?

This is where memslots come in.  Think of memslots as a one-level page tablea
that translate GPAs to HVAs.  A memslot, set by userspace, tells KVM the
corresponding HVA for a given GPA.

Before the guest is running (assuming host userspace isn't broken), the
userspace VMM will first allocate virtual memory (HVA) for all physical
memory it wants to map into the guest (GPA).  It then tells KVM how to
translate a given GPA to its HVA by creating a memslot.

To avoid getting lost in a tangent about page offsets, let's assume array[0]'s
GPA = 0xa000.  For KVM to create a GPA->HPA mapping for the guest, there _must_
be a memslot that translates GPA 0xa000 to an HVA[*].  Let's say HVA = 0xb000.

On an EPT violation, KVM does a memslot lookup to translate the GPA (0xa000) to
its HVA (0xb000), and then walks the host page tables to translate the HVA into
a HPA (let's say that ends up being 0xc000).  KVM then stuffs 0xc000 into the
EPT tables, which yields:

  GPA    -> HVA    (KVM memslots)
  0xa000    0xb000

  HVA    -> HPA    (host page tables)
  0xb000    0xc000

  GPA    -> HPA    (extended page tables)
  0xa000    0xc000

To keep the EPT tables synchronized with the host page tables, if HVA->HPA
changes, e.g. HVA 0xb000 is remapped to HPA 0xd000, then KVM will get notified
by the host kernel that the HVA has been unmapped and will find and unmap
the corresponding GPA (again via memslots) to HPA translations.

Ditto for the case where userspace moves a memslot, e.g. if HVA is changed
to 0xe000, KVM will first unmap all old GPA->HPA translations so that accesses
to GPA 0xa000 from the guest will take an EPT violation and see the new HVA
(and presumably a new HPA).

[*] If there is no memslot, KVM will exit to userspace on the EPT violation,
    with some information about what GPA the guest was accessing.  This is how
    emulated MMIO is implemented, e.g. userspace intentionally doesn't back a
    GPA with a memslot so that it can trap guest accesses to said GPA for the
    purpose of emulating a device.

> [1] Please note that this hardware walk is the last step, which only
> translates the guest physical address to the host physical address through
> the four-level nested page table.
> [2] Please note that this hardware walk assumes translating the VA to the
> PA without virtualization involvement.
> 
> Please note that the above addresses are not real and just use for
> explanations.
> 
> Thanks,
> Harry

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
  2020-10-13  6:43               ` Paolo Bonzini
@ 2020-10-13 20:36                 ` harry harry
  2020-10-14  8:27                   ` Paolo Bonzini
  2020-10-14  8:29                   ` Maxim Levitsky
  0 siblings, 2 replies; 16+ messages in thread
From: harry harry @ 2020-10-13 20:36 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: Maxim Levitsky, qemu-devel, mathieu.tarral, stefanha, libvir-list, kvm

Hi Paolo and Sean,

Thanks much for your prompt replies and clear explanations.

On Tue, Oct 13, 2020 at 2:43 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> No, the logic to find the HPA with a given HVA is the same as the
> hardware logic to translate HVA -> HPA.  That is it uses the host
> "regular" page tables, not the nested page tables.
>
> In order to translate GPA to HPA, instead, KVM does not use the nested
> page tables.

I am curious why KVM does not directly use GPAs as HVAs and leverage
nested page tables to translate HVAs (i.e., GPAs) to HPAs? Is that
because 1) the hardware logic of ``GPA -> [extended/nested page
tables] -> HPA[*]'' is different[**] from the hardware logic of ``HVA
-> [host regular page tables] -> HPA''; 2) if 1) is true, it is
natural to reuse Linux's original functionality to translate HVAs to
HPAs through regular page tables.

[*]: Here, the translation means the last step for MMU to translate a
GVA's corresponding GPA to an HPA through the extended/nested page
tables.
[**]: To my knowledge, the hardware logic of ``GPA -> [extended/nested
page tables] -> HPA'' seems to be the same as the hardware logic of
``HVA -> [host regular page tables] -> HPA''. I appreciate it if you
could point out the differences I ignored. Thanks!

Best,
Harry

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
  2020-10-13  7:03             ` Sean Christopherson
@ 2020-10-13 22:40               ` harry harry
  2020-10-14  8:28                 ` Paolo Bonzini
  0 siblings, 1 reply; 16+ messages in thread
From: harry harry @ 2020-10-13 22:40 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Maxim Levitsky, qemu-devel, mathieu.tarral, stefanha,
	libvir-list, kvm, pbonzini

Hi Sean,

Thanks much for your detailed replies. It's clear to me why GPAs are
different from HVAs in QEM/KVM. Thanks! I appreciate it if you could
help with the following two more questions.

On Tue, Oct 13, 2020 at 3:03 AM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> This is where memslots come in.  Think of memslots as a one-level page tablea
> that translate GPAs to HVAs.  A memslot, set by userspace, tells KVM the
> corresponding HVA for a given GPA.
>
> Before the guest is running (assuming host userspace isn't broken), the
> userspace VMM will first allocate virtual memory (HVA) for all physical
> memory it wants to map into the guest (GPA).  It then tells KVM how to
> translate a given GPA to its HVA by creating a memslot.
>
> To avoid getting lost in a tangent about page offsets, let's assume array[0]'s
> GPA = 0xa000.  For KVM to create a GPA->HPA mapping for the guest, there _must_
> be a memslot that translates GPA 0xa000 to an HVA[*].  Let's say HVA = 0xb000.
>
> On an EPT violation, KVM does a memslot lookup to translate the GPA (0xa000) to
> its HVA (0xb000), and then walks the host page tables to translate the HVA into
> a HPA (let's say that ends up being 0xc000).  KVM then stuffs 0xc000 into the
> EPT tables, which yields:
>
>   GPA    -> HVA    (KVM memslots)
>   0xa000    0xb000
>
>   HVA    -> HPA    (host page tables)
>   0xb000    0xc000
>
>   GPA    -> HPA    (extended page tables)
>   0xa000    0xc000
>
> To keep the EPT tables synchronized with the host page tables, if HVA->HPA
> changes, e.g. HVA 0xb000 is remapped to HPA 0xd000, then KVM will get notified
> by the host kernel that the HVA has been unmapped and will find and unmap
> the corresponding GPA (again via memslots) to HPA translations.
>
> Ditto for the case where userspace moves a memslot, e.g. if HVA is changed
> to 0xe000, KVM will first unmap all old GPA->HPA translations so that accesses
> to GPA 0xa000 from the guest will take an EPT violation and see the new HVA
> (and presumably a new HPA).

Q1: Is there any file like ``/proc/pid/pagemap'' to record the
mappings between GPAs and HVAs in the host OS?

Q2: Seems that there might be extra overhead (e.g., synchronization
between EPT tables and host regular page tables; maintaining extra
regular page tables and data structures), which is caused by the extra
translation between GPAs to HVAs via memslots. Why doesn't KVM
directly use GPAs as HVAs and leverage extended/nested page tables to
translate HVAs (i.e., GPAs) to HPAs?

Thanks,
Harry

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
  2020-10-13 20:36                 ` harry harry
@ 2020-10-14  8:27                   ` Paolo Bonzini
  2020-10-14  8:29                   ` Maxim Levitsky
  1 sibling, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2020-10-14  8:27 UTC (permalink / raw)
  To: harry harry, Sean Christopherson
  Cc: Maxim Levitsky, qemu-devel, mathieu.tarral, stefanha, libvir-list, kvm

On 13/10/20 22:36, harry harry wrote:
> Hi Paolo and Sean,
> 
> Thanks much for your prompt replies and clear explanations.
> 
> On Tue, Oct 13, 2020 at 2:43 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> No, the logic to find the HPA with a given HVA is the same as the
>> hardware logic to translate HVA -> HPA.  That is it uses the host
>> "regular" page tables, not the nested page tables.
>>
>> In order to translate GPA to HPA, instead, KVM does not use the nested
>> page tables.
> 
> I am curious why KVM does not directly use GPAs as HVAs and leverage
> nested page tables to translate HVAs (i.e., GPAs) to HPAs?

GPAs and HVAs are different things.  In fact I'm not aware of any
hypervisor that uses HVA==GPA.  On 32-bit x86 systems HVAs are 32-bit
(obviously) but GPAs are 36-bit.

In the case of KVM, HVAs are controlled by the rest of Linux; for
example, when you do "mmap" to allocate guest memory you cannot ask the
OS to return the guest memory at the exact HVA that is needed by the
guest.  There could be something else at that HVA (or you don't want
anything at that HVA: GPA 0 is valid, but HVA 0 is the NULL pointer!).
There's also cases where the same memory appears in multiple places in
the guest memory map (aliasing).

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
  2020-10-13 22:40               ` harry harry
@ 2020-10-14  8:28                 ` Paolo Bonzini
  2020-10-15  3:43                   ` harry harry
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2020-10-14  8:28 UTC (permalink / raw)
  To: harry harry, Sean Christopherson
  Cc: Maxim Levitsky, qemu-devel, mathieu.tarral, stefanha, libvir-list, kvm

On 14/10/20 00:40, harry harry wrote:
> Q1: Is there any file like ``/proc/pid/pagemap'' to record the
> mappings between GPAs and HVAs in the host OS?

No, there isn't.

> Q2: Seems that there might be extra overhead (e.g., synchronization
> between EPT tables and host regular page tables; maintaining extra
> regular page tables and data structures), which is caused by the extra
> translation between GPAs to HVAs via memslots. Why doesn't KVM
> directly use GPAs as HVAs and leverage extended/nested page tables to
> translate HVAs (i.e., GPAs) to HPAs?

See my other answer.  What you are saying is simply not possible.

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
  2020-10-13 20:36                 ` harry harry
  2020-10-14  8:27                   ` Paolo Bonzini
@ 2020-10-14  8:29                   ` Maxim Levitsky
  2020-10-15  3:45                     ` harry harry
  1 sibling, 1 reply; 16+ messages in thread
From: Maxim Levitsky @ 2020-10-14  8:29 UTC (permalink / raw)
  To: harry harry, Paolo Bonzini, Sean Christopherson
  Cc: qemu-devel, mathieu.tarral, stefanha, libvir-list, kvm

On Tue, 2020-10-13 at 16:36 -0400, harry harry wrote:
> Hi Paolo and Sean,
> 
> Thanks much for your prompt replies and clear explanations.
> 
> On Tue, Oct 13, 2020 at 2:43 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > No, the logic to find the HPA with a given HVA is the same as the
> > hardware logic to translate HVA -> HPA.  That is it uses the host
> > "regular" page tables, not the nested page tables.
> > 
> > In order to translate GPA to HPA, instead, KVM does not use the nested
> > page tables.
> 
> I am curious why KVM does not directly use GPAs as HVAs and leverage
> nested page tables to translate HVAs (i.e., GPAs) to HPAs? Is that
> because 1) the hardware logic of ``GPA -> [extended/nested page
> tables] -> HPA[*]'' is different[**] from the hardware logic of ``HVA
> -> [host regular page tables] -> HPA''; 2) if 1) is true, it is
> natural to reuse Linux's original functionality to translate HVAs to
> HPAs through regular page tables.
I would like to emphisise again. The HVA space is not fully free when a guest starts,
since it contains qemu's heap, code, data, and whatever qemu needs. However guest't
GPA address space must be allocated fully. E.g if qemu heap starts at 0x40000,
then guest can't have physical memory at 0x40000 following your suggestion, which
is wrong. It can be in theory done by blacklisting these areas via ACPI/BIOS provided
memory map, but that would be very very difficult to maintain and not worth it.

Best regards,
	Maxim Levitsky

> 
> [*]: Here, the translation means the last step for MMU to translate a
> GVA's corresponding GPA to an HPA through the extended/nested page
> tables.
> [**]: To my knowledge, the hardware logic of ``GPA -> [extended/nested
> page tables] -> HPA'' seems to be the same as the hardware logic of
> ``HVA -> [host regular page tables] -> HPA''. I appreciate it if you
> could point out the differences I ignored. Thanks!
> 
> Best,
> Harry
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
  2020-10-14  8:28                 ` Paolo Bonzini
@ 2020-10-15  3:43                   ` harry harry
  0 siblings, 0 replies; 16+ messages in thread
From: harry harry @ 2020-10-15  3:43 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Maxim Levitsky, qemu-devel, mathieu.tarral,
	stefanha, libvir-list, kvm

Hi Paolo and Sean,

It is clear to me now. Thanks much for your reply and help.


Best regards,
Harry

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!
  2020-10-14  8:29                   ` Maxim Levitsky
@ 2020-10-15  3:45                     ` harry harry
  0 siblings, 0 replies; 16+ messages in thread
From: harry harry @ 2020-10-15  3:45 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Sean Christopherson, qemu-devel, mathieu.tarral,
	stefanha, libvir-list, kvm

Hi Maxim,

Thanks for your emphasis. It's much clearer.

Best,
Harry

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-10-15  3:45 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-11  5:26 Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks! harry harry
2020-10-11  7:29 ` Maxim Levitsky
2020-10-11 14:11   ` harry harry
2020-10-12 16:54     ` Sean Christopherson
2020-10-13  4:30       ` harry harry
2020-10-13  4:52         ` Sean Christopherson
     [not found]           ` <CA+-xGqO4DtUs3-jH+QMPEze2GrXwtNX0z=vVUVak5HOpPKaDxQ@mail.gmail.com>
     [not found]             ` <CA+-xGqMMa-DB1SND5MRugusDafjNA9CVw-=OBK7q=CK1impmTQ@mail.gmail.com>
2020-10-13  6:43               ` Paolo Bonzini
2020-10-13 20:36                 ` harry harry
2020-10-14  8:27                   ` Paolo Bonzini
2020-10-14  8:29                   ` Maxim Levitsky
2020-10-15  3:45                     ` harry harry
2020-10-13  7:03             ` Sean Christopherson
2020-10-13 22:40               ` harry harry
2020-10-14  8:28                 ` Paolo Bonzini
2020-10-15  3:43                   ` harry harry
2020-10-13  5:00       ` harry harry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).