All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexander Graf <agraf@suse.de>
To: Paul Mackerras <paulus@samba.org>
Cc: "kvm-ppc@vger.kernel.org" <kvm-ppc@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH 4/5] KVM: PPC: Book3S HV: Don't give the guest RW access to RO pages
Date: Mon, 26 Nov 2012 14:09:11 +0100	[thread overview]
Message-ID: <C55258FE-94CF-4718-9707-5A3C6F3AAC16@suse.de> (raw)
In-Reply-To: <20121124093237.GF23537@bloggs.ozlabs.ibm.com>


On 24.11.2012, at 10:32, Paul Mackerras wrote:

> On Sat, Nov 24, 2012 at 10:05:37AM +0100, Alexander Graf wrote:
>> 
>> 
>> On 23.11.2012, at 23:13, Paul Mackerras <paulus@samba.org> wrote:
>> 
>>> On Fri, Nov 23, 2012 at 04:47:45PM +0100, Alexander Graf wrote:
>>>> 
>>>> On 22.11.2012, at 10:28, Paul Mackerras wrote:
>>>> 
>>>>> Currently, if the guest does an H_PROTECT hcall requesting that the
>>>>> permissions on a HPT entry be changed to allow writing, we make the
>>>>> requested change even if the page is marked read-only in the host
>>>>> Linux page tables.  This is a problem since it would for instance
>>>>> allow a guest to modify a page that KSM has decided can be shared
>>>>> between multiple guests.
>>>>> 
>>>>> To fix this, if the new permissions for the page allow writing, we need
>>>>> to look up the memslot for the page, work out the host virtual address,
>>>>> and look up the Linux page tables to get the PTE for the page.  If that
>>>>> PTE is read-only, we reduce the HPTE permissions to read-only.
>>>> 
>>>> How does KSM handle this usually? If you reduce the permissions to R/O, how do you ever get a R/W page from a deduplicated one?
>>> 
>>> The scenario goes something like this:
>>> 
>>> 1. Guest creates an HPTE with RO permissions.
>>> 2. KSM decides the page is identical to another page and changes the
>>>  HPTE to point to a shared copy.  Permissions are still RO.
>>> 3. Guest decides it wants write access to the page and does an
>>>  H_PROTECT hcall to change the permissions on the HPTE to RW.
>>> 
>>> The bug is that we actually make the requested change in step 3.
>>> Instead we should leave it at RO, then when the guest tries to write
>>> to the page, we take a hypervisor page fault, copy the page and give
>>> the guest write access to its own copy of the page.
>>> 
>>> So what this patch does is add code to H_PROTECT so that if the guest
>>> is requesting RW access, we check the Linux PTE to see if the
>>> underlying guest page is RO, and if so reduce the permissions in the
>>> HPTE to RO.
>> 
>> But this will be guest visible, because now H_PROTECT doesn't actually mark the page R/W in the HTAB, right?
> 
> No - the guest view of the HPTE has R/W permissions.  The guest view
> of the HPTE is made up of doubleword 0 from the real HPT plus
> rev->guest_rpte for doubleword 1 (where rev is the entry in the revmap
> array, kvm->arch.revmap, for the HPTE).  The guest view can be
> different from the host/hardware view, which is in the real HPT.  For
> instance, the guest view of a HPTE might be valid but the host view
> might be invalid because the underlying real page has been paged out -
> in that case we use a software bit which we call HPTE_V_ABSENT to
> remind ourselves that there is something valid there from the guest's
> point of view.  Or the guest view can be R/W but the host view is RO,
> as in the case where KSM has merged the page.
> 
>> So the flow with this patch is:
>> 
>>  - guest page permission fault
> 
> This comes through the host (kvmppc_hpte_hv_fault()) which looks at
> the guest view of the HPTE, sees that it has RO permissions, and sends
> the page fault to the guest.
> 
>>  - guest does H_PROTECT to mark page r/w
>>  - H_PROTECT doesn't do anything
>>  - guest returns from permission handler, triggers write fault
> 
> This comes once again to kvmppc_hpte_hv_fault(), which sees that the
> guest view of the HPTE has R/W permissions now, and sends the page
> fault to kvmppc_book3s_hv_page_fault(), which requests write access to
> the page, possibly triggering copy-on-write or whatever, and updates
> the real HPTE to have R/W permissions and possibly point to a new page
> of memory.
> 
>> 
>> 2 questions here:
>> 
>> How does the host know that the page is actually r/w?
> 
> I assume you mean RO?  It looks up the memslot for the guest physical
> address (which it gets from rev->guest_rpte), uses that to work out
> the host virtual address (i.e. the address in qemu's address space),
> looks up the Linux PTE in qemu's Linux page tables, and looks at the
> _PAGE_RW bit there.
> 
>> How does this work on 970? I thought page faults always go straight to the guest there.
> 
> They do, which is why PPC970 can't do any of this.  On PPC970 we have
> kvm->arch.using_mmu_notifiers == 0, and that makes the code pin every
> page of guest memory that is mapped by a guest HPTE (with a Linux
> guest, that means every page, because of the linear mapping).  On
> POWER7 we have kvm->arch.using_mmu_notifiers == 1, which enables
> host paging and deduplication of guest memory.

Thanks a lot for the detailed explanation! Maybe you guys should just release an HV capable p7 system publicly, so we can deprecate 970 support. That would make a few things quite a bit easier ;)

Thanks, applied to kvm-ppc-next.

Alex

WARNING: multiple messages have this Message-ID (diff)
From: Alexander Graf <agraf@suse.de>
To: Paul Mackerras <paulus@samba.org>
Cc: "kvm-ppc@vger.kernel.org" <kvm-ppc@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH 4/5] KVM: PPC: Book3S HV: Don't give the guest RW access to RO pages
Date: Mon, 26 Nov 2012 13:09:11 +0000	[thread overview]
Message-ID: <C55258FE-94CF-4718-9707-5A3C6F3AAC16@suse.de> (raw)
In-Reply-To: <20121124093237.GF23537@bloggs.ozlabs.ibm.com>


On 24.11.2012, at 10:32, Paul Mackerras wrote:

> On Sat, Nov 24, 2012 at 10:05:37AM +0100, Alexander Graf wrote:
>> 
>> 
>> On 23.11.2012, at 23:13, Paul Mackerras <paulus@samba.org> wrote:
>> 
>>> On Fri, Nov 23, 2012 at 04:47:45PM +0100, Alexander Graf wrote:
>>>> 
>>>> On 22.11.2012, at 10:28, Paul Mackerras wrote:
>>>> 
>>>>> Currently, if the guest does an H_PROTECT hcall requesting that the
>>>>> permissions on a HPT entry be changed to allow writing, we make the
>>>>> requested change even if the page is marked read-only in the host
>>>>> Linux page tables.  This is a problem since it would for instance
>>>>> allow a guest to modify a page that KSM has decided can be shared
>>>>> between multiple guests.
>>>>> 
>>>>> To fix this, if the new permissions for the page allow writing, we need
>>>>> to look up the memslot for the page, work out the host virtual address,
>>>>> and look up the Linux page tables to get the PTE for the page.  If that
>>>>> PTE is read-only, we reduce the HPTE permissions to read-only.
>>>> 
>>>> How does KSM handle this usually? If you reduce the permissions to R/O, how do you ever get a R/W page from a deduplicated one?
>>> 
>>> The scenario goes something like this:
>>> 
>>> 1. Guest creates an HPTE with RO permissions.
>>> 2. KSM decides the page is identical to another page and changes the
>>>  HPTE to point to a shared copy.  Permissions are still RO.
>>> 3. Guest decides it wants write access to the page and does an
>>>  H_PROTECT hcall to change the permissions on the HPTE to RW.
>>> 
>>> The bug is that we actually make the requested change in step 3.
>>> Instead we should leave it at RO, then when the guest tries to write
>>> to the page, we take a hypervisor page fault, copy the page and give
>>> the guest write access to its own copy of the page.
>>> 
>>> So what this patch does is add code to H_PROTECT so that if the guest
>>> is requesting RW access, we check the Linux PTE to see if the
>>> underlying guest page is RO, and if so reduce the permissions in the
>>> HPTE to RO.
>> 
>> But this will be guest visible, because now H_PROTECT doesn't actually mark the page R/W in the HTAB, right?
> 
> No - the guest view of the HPTE has R/W permissions.  The guest view
> of the HPTE is made up of doubleword 0 from the real HPT plus
> rev->guest_rpte for doubleword 1 (where rev is the entry in the revmap
> array, kvm->arch.revmap, for the HPTE).  The guest view can be
> different from the host/hardware view, which is in the real HPT.  For
> instance, the guest view of a HPTE might be valid but the host view
> might be invalid because the underlying real page has been paged out -
> in that case we use a software bit which we call HPTE_V_ABSENT to
> remind ourselves that there is something valid there from the guest's
> point of view.  Or the guest view can be R/W but the host view is RO,
> as in the case where KSM has merged the page.
> 
>> So the flow with this patch is:
>> 
>>  - guest page permission fault
> 
> This comes through the host (kvmppc_hpte_hv_fault()) which looks at
> the guest view of the HPTE, sees that it has RO permissions, and sends
> the page fault to the guest.
> 
>>  - guest does H_PROTECT to mark page r/w
>>  - H_PROTECT doesn't do anything
>>  - guest returns from permission handler, triggers write fault
> 
> This comes once again to kvmppc_hpte_hv_fault(), which sees that the
> guest view of the HPTE has R/W permissions now, and sends the page
> fault to kvmppc_book3s_hv_page_fault(), which requests write access to
> the page, possibly triggering copy-on-write or whatever, and updates
> the real HPTE to have R/W permissions and possibly point to a new page
> of memory.
> 
>> 
>> 2 questions here:
>> 
>> How does the host know that the page is actually r/w?
> 
> I assume you mean RO?  It looks up the memslot for the guest physical
> address (which it gets from rev->guest_rpte), uses that to work out
> the host virtual address (i.e. the address in qemu's address space),
> looks up the Linux PTE in qemu's Linux page tables, and looks at the
> _PAGE_RW bit there.
> 
>> How does this work on 970? I thought page faults always go straight to the guest there.
> 
> They do, which is why PPC970 can't do any of this.  On PPC970 we have
> kvm->arch.using_mmu_notifiers = 0, and that makes the code pin every
> page of guest memory that is mapped by a guest HPTE (with a Linux
> guest, that means every page, because of the linear mapping).  On
> POWER7 we have kvm->arch.using_mmu_notifiers = 1, which enables
> host paging and deduplication of guest memory.

Thanks a lot for the detailed explanation! Maybe you guys should just release an HV capable p7 system publicly, so we can deprecate 970 support. That would make a few things quite a bit easier ;)

Thanks, applied to kvm-ppc-next.

Alex


  reply	other threads:[~2012-11-26 13:09 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-22  9:24 [PATCH 0/5] KVM: PPC: Fix various bugs and vulnerabilities in HV KVM Paul Mackerras
2012-11-22  9:24 ` Paul Mackerras
2012-11-22  9:25 ` [PATCH 1/5] KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking Paul Mackerras
2012-11-22  9:25   ` Paul Mackerras
2012-11-23 14:13   ` Alexander Graf
2012-11-23 14:13     ` Alexander Graf
2012-11-23 21:42     ` Paul Mackerras
2012-11-23 21:42       ` Paul Mackerras
2012-11-26 13:15       ` Alexander Graf
2012-11-26 13:15         ` Alexander Graf
2012-11-26 21:33         ` Paul Mackerras
2012-11-26 21:33           ` Paul Mackerras
2012-11-26 21:55           ` Alexander Graf
2012-11-26 21:55             ` Alexander Graf
2012-11-26 22:03             ` Alexander Graf
2012-11-26 22:03               ` Alexander Graf
2012-11-26 23:11               ` Paul Mackerras
2012-11-26 23:11                 ` Paul Mackerras
2012-11-24  8:37     ` [PATCH v2] " Paul Mackerras
2012-11-24  8:37       ` Paul Mackerras
2012-11-26 23:16       ` Alexander Graf
2012-11-26 23:16         ` Alexander Graf
2012-11-26 23:18         ` Paul Mackerras
2012-11-26 23:18           ` Paul Mackerras
2012-11-26 23:20           ` Alexander Graf
2012-11-26 23:20             ` Alexander Graf
2012-11-27  0:20             ` Paul Mackerras
2012-11-27  0:20               ` Paul Mackerras
2012-12-22 14:09       ` [PATCH] KVM: PPC: Book3S HV: Fix compilation without CONFIG_PPC_POWERNV Andreas Schwab
2012-12-22 14:09         ` Andreas Schwab
2013-01-06 13:05         ` Alexander Graf
2013-01-06 13:05           ` Alexander Graf
2012-11-22  9:27 ` [PATCH 2/5] KVM: PPC: Book3S HV: Reset reverse-map chains when resetting the HPT Paul Mackerras
2012-11-22  9:27   ` Paul Mackerras
2012-11-22  9:28 ` [PATCH 3/5] KVM: PPC: Book3S HV: Improve handling of local vs. global TLB invalidations Paul Mackerras
2012-11-22  9:28   ` Paul Mackerras
2012-11-23 15:43   ` Alexander Graf
2012-11-23 15:43     ` Alexander Graf
2012-11-23 22:07     ` Paul Mackerras
2012-11-23 22:07       ` Paul Mackerras
2012-11-26 13:10       ` Alexander Graf
2012-11-26 13:10         ` Alexander Graf
2012-11-26 21:48         ` Paul Mackerras
2012-11-26 21:48           ` Paul Mackerras
2012-11-26 22:03           ` Alexander Graf
2012-11-26 22:03             ` Alexander Graf
2012-11-26 23:16             ` Paul Mackerras
2012-11-26 23:16               ` Paul Mackerras
2012-11-26 23:18               ` Alexander Graf
2012-11-26 23:18                 ` Alexander Graf
2012-11-22  9:28 ` [PATCH 4/5] KVM: PPC: Book3S HV: Don't give the guest RW access to RO pages Paul Mackerras
2012-11-22  9:28   ` Paul Mackerras
2012-11-23 15:47   ` Alexander Graf
2012-11-23 15:47     ` Alexander Graf
2012-11-23 22:13     ` Paul Mackerras
2012-11-23 22:13       ` Paul Mackerras
2012-11-24  9:05       ` Alexander Graf
2012-11-24  9:05         ` Alexander Graf
2012-11-24  9:32         ` Paul Mackerras
2012-11-24  9:32           ` Paul Mackerras
2012-11-26 13:09           ` Alexander Graf [this message]
2012-11-26 13:09             ` Alexander Graf
2012-11-22  9:29 ` [PATCH 5/5] KVM: PPC: Book3S HV: Report correct HPT entry index when reading HPT Paul Mackerras
2012-11-22  9:29   ` Paul Mackerras
2012-11-23 15:48 ` [PATCH 0/5] KVM: PPC: Fix various bugs and vulnerabilities in HV KVM Alexander Graf
2012-11-23 15:48   ` Alexander Graf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C55258FE-94CF-4718-9707-5A3C6F3AAC16@suse.de \
    --to=agraf@suse.de \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.