All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoffer Dall <christoffer.dall@linaro.org>
To: Mario Smarduch <m.smarduch@samsung.com>
Cc: kvmarm@lists.cs.columbia.edu, marc.zyngier@arm.com,
	pbonzini@redhat.com, gleb@kernel.org, agraf@suse.de,
	xiantao.zhang@intel.com, borntraeger@de.ibm.com,
	cornelia.huck@de.ibm.com, xiaoguangrong@linux.vnet.ibm.com,
	steve.capper@arm.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, jays.lee@samsung.com,
	sungjinn.chung@samsung.com
Subject: Re: [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
Date: Mon, 18 Aug 2014 14:51:05 +0200	[thread overview]
Message-ID: <20140818125105.GE19635@cbox> (raw)
In-Reply-To: <53EC0ED3.8070307@samsung.com>

On Wed, Aug 13, 2014 at 06:20:19PM -0700, Mario Smarduch wrote:
> On 08/13/2014 12:30 AM, Christoffer Dall wrote:
> > On Tue, Aug 12, 2014 at 06:27:11PM -0700, Mario Smarduch wrote:
> >> On 08/12/2014 02:50 AM, Christoffer Dall wrote:
> >>> On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote:
> >>>> On 08/11/2014 12:13 PM, Christoffer Dall wrote:
> >>>>> On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
> > 
> > [...]
> > 
> >>>>>> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
> >>>>>>  {
> >>>>>>  	pte_t *pte = (pte_t *)data;
> >>>>>>  
> >>>>>> -	stage2_set_pte(kvm, NULL, gpa, pte, false);
> >>>>>> +	stage2_set_pte(kvm, NULL, gpa, pte, false, false);
> >>>>>
> >>>>> why is logging never active if we are called from MMU notifiers?
> >>>>
> >>>> mmu notifiers update sptes, but I don't see how these updates
> >>>> can result in guest dirty pages. Also guest pages are marked dirty
> >>>> from 2nd stage page fault handlers (searching through the code).
> >>>>
> >>> Ok, then add:
> >>>
> >>> /*
> >>>  * We can always call stage2_set_pte with logging_active == false,
> >>>  * because MMU notifiers will have unmapped a huge PMD before calling
> >>>  * ->change_pte() (which in turn calls kvm_set_spte_hva()) and therefore
> >>>  * stage2_set_pte() never needs to clear out a huge PMD through this
> >>>  * calling path.
> >>>  */
> >>
> >> So here on permission change to primary pte's kernel first invalidates
> >> related s2ptes followed by ->change_pte calls to synchronize s2ptes. As
> >> consequence of invalidation we unmap huge PMDs, if a page falls in that
> >> range.
> >>
> >> Is the comment to point out use of logging flag under various scenarios?
> > 
> > The comment is because when you look at this function it is not obvious
> > why we pass logging_active=false, despite logging may actually be
> > active.  This could suggest that the parameter to stage2_set_pte()
> > should be named differently (break_huge_pmds) or something like that,
> > but we can also be satisfied with the comment.
> 
> Ok I see, I was thinking you thought it was breaking something.
> Yeah I'll add the comment, in reality this is another use case
> where a PMD may need to be converted to page table so it makes sense
> to contrast use cases.
> 

the hidden knowledge is that MMU notifiers will ensure a huge PMD gets
unmapped before trying to change the physical backing of the underlying
PTEs, so it's a gigantic kernel bug if this gets called on something
mapped with a huge PMD.


> > 
> >>
> >> Should I add comments on flag use in other places as well?
> >>
> > 
> > It's always a judgement call.  I didn't find it necessarry to put a
> > comment elsewhere because I think it's pretty obivous that we would
> > never care about logging writes to device regions.
> > 
> > However, this made me think, are we making sure that we are not marking
> > device mappings as read-only in the wp_range functions?  I think it's
> 
> KVM_SET_USER_MEMORY_REGION ioctl doesn't check type of region being
> installed/operated on (KVM_MEM_LOG_DIRTY_PAGES), in case of QEMU
> these regions wind up in KVMState->KVMSlot[], when
> memory_region_add_subregion() is called KVM listener installs it.
> For migration and dirty page logging QEMU walks the KVMSlot[] array.
> 
> For QEMU VFIO (PCI) mmap()ing BAR of type IORESOURCE_MEM,
> causes the memory region to be added to KVMState->KVMSlot[].
> In that case it's possible to walk KVMState->KVMSlot[] issue
> the ioctl and  come across  a device mapping with normal memory and
> WP it's s2ptes (VFIO sets unmigrateble state though).
> 
> But I'm not sure what's there to stop someone calling the ioctl and
> install a region with device memory type. Most likely though if you
> installed that kind of region migration would be disabled.
> 
> But just for logging use not checking memory type could be an issue.
> 
I forgot that the current write-protect'ing is limited to the memory
region boundaries, so everything should be fine.

If user-space write-protects device memory regions, the worst
consequence is that it breaks the guest, but that's its own
responsibility, so I don't think you need to change anything.

-Christoffer

WARNING: multiple messages have this Message-ID (diff)
From: christoffer.dall@linaro.org (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support
Date: Mon, 18 Aug 2014 14:51:05 +0200	[thread overview]
Message-ID: <20140818125105.GE19635@cbox> (raw)
In-Reply-To: <53EC0ED3.8070307@samsung.com>

On Wed, Aug 13, 2014 at 06:20:19PM -0700, Mario Smarduch wrote:
> On 08/13/2014 12:30 AM, Christoffer Dall wrote:
> > On Tue, Aug 12, 2014 at 06:27:11PM -0700, Mario Smarduch wrote:
> >> On 08/12/2014 02:50 AM, Christoffer Dall wrote:
> >>> On Mon, Aug 11, 2014 at 06:25:05PM -0700, Mario Smarduch wrote:
> >>>> On 08/11/2014 12:13 PM, Christoffer Dall wrote:
> >>>>> On Thu, Jul 24, 2014 at 05:56:08PM -0700, Mario Smarduch wrote:
> > 
> > [...]
> > 
> >>>>>> @@ -1151,7 +1170,7 @@ static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
> >>>>>>  {
> >>>>>>  	pte_t *pte = (pte_t *)data;
> >>>>>>  
> >>>>>> -	stage2_set_pte(kvm, NULL, gpa, pte, false);
> >>>>>> +	stage2_set_pte(kvm, NULL, gpa, pte, false, false);
> >>>>>
> >>>>> why is logging never active if we are called from MMU notifiers?
> >>>>
> >>>> mmu notifiers update sptes, but I don't see how these updates
> >>>> can result in guest dirty pages. Also guest pages are marked dirty
> >>>> from 2nd stage page fault handlers (searching through the code).
> >>>>
> >>> Ok, then add:
> >>>
> >>> /*
> >>>  * We can always call stage2_set_pte with logging_active == false,
> >>>  * because MMU notifiers will have unmapped a huge PMD before calling
> >>>  * ->change_pte() (which in turn calls kvm_set_spte_hva()) and therefore
> >>>  * stage2_set_pte() never needs to clear out a huge PMD through this
> >>>  * calling path.
> >>>  */
> >>
> >> So here on permission change to primary pte's kernel first invalidates
> >> related s2ptes followed by ->change_pte calls to synchronize s2ptes. As
> >> consequence of invalidation we unmap huge PMDs, if a page falls in that
> >> range.
> >>
> >> Is the comment to point out use of logging flag under various scenarios?
> > 
> > The comment is because when you look at this function it is not obvious
> > why we pass logging_active=false, despite logging may actually be
> > active.  This could suggest that the parameter to stage2_set_pte()
> > should be named differently (break_huge_pmds) or something like that,
> > but we can also be satisfied with the comment.
> 
> Ok I see, I was thinking you thought it was breaking something.
> Yeah I'll add the comment, in reality this is another use case
> where a PMD may need to be converted to page table so it makes sense
> to contrast use cases.
> 

the hidden knowledge is that MMU notifiers will ensure a huge PMD gets
unmapped before trying to change the physical backing of the underlying
PTEs, so it's a gigantic kernel bug if this gets called on something
mapped with a huge PMD.


> > 
> >>
> >> Should I add comments on flag use in other places as well?
> >>
> > 
> > It's always a judgement call.  I didn't find it necessarry to put a
> > comment elsewhere because I think it's pretty obivous that we would
> > never care about logging writes to device regions.
> > 
> > However, this made me think, are we making sure that we are not marking
> > device mappings as read-only in the wp_range functions?  I think it's
> 
> KVM_SET_USER_MEMORY_REGION ioctl doesn't check type of region being
> installed/operated on (KVM_MEM_LOG_DIRTY_PAGES), in case of QEMU
> these regions wind up in KVMState->KVMSlot[], when
> memory_region_add_subregion() is called KVM listener installs it.
> For migration and dirty page logging QEMU walks the KVMSlot[] array.
> 
> For QEMU VFIO (PCI) mmap()ing BAR of type IORESOURCE_MEM,
> causes the memory region to be added to KVMState->KVMSlot[].
> In that case it's possible to walk KVMState->KVMSlot[] issue
> the ioctl and  come across  a device mapping with normal memory and
> WP it's s2ptes (VFIO sets unmigrateble state though).
> 
> But I'm not sure what's there to stop someone calling the ioctl and
> install a region with device memory type. Most likely though if you
> installed that kind of region migration would be disabled.
> 
> But just for logging use not checking memory type could be an issue.
> 
I forgot that the current write-protect'ing is limited to the memory
region boundaries, so everything should be fine.

If user-space write-protects device memory regions, the worst
consequence is that it breaks the guest, but that's its own
responsibility, so I don't think you need to change anything.

-Christoffer

  parent reply	other threads:[~2014-08-18 12:50 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-25  0:56 [PATCH v9 0/4] arm: dirty page logging support for ARMv7 Mario Smarduch
2014-07-25  0:56 ` Mario Smarduch
2014-07-25  0:56 ` [PATCH v9 1/4] arm: add ARMv7 HYP API to flush VM TLBs, change generic TLB flush to support arch flush Mario Smarduch
2014-07-25  0:56   ` Mario Smarduch
2014-07-25  6:12   ` Alexander Graf
2014-07-25  6:12     ` Alexander Graf
2014-07-25 17:37     ` Mario Smarduch
2014-07-25 17:37       ` Mario Smarduch
2014-08-08 17:50       ` [PATCH v9 1/4] arm: add ARMv7 HYP API to flush VM TLBs ... - looking for comments Mario Smarduch
2014-08-08 17:50         ` Mario Smarduch
2014-08-11 19:12   ` [PATCH v9 1/4] arm: add ARMv7 HYP API to flush VM TLBs, change generic TLB flush to support arch flush Christoffer Dall
2014-08-11 19:12     ` Christoffer Dall
2014-08-11 23:54     ` Mario Smarduch
2014-08-11 23:54       ` Mario Smarduch
2014-07-25  0:56 ` [PATCH v9 2/4] arm: ARMv7 dirty page logging inital mem region write protect (w/no huge PUD support) Mario Smarduch
2014-07-25  0:56   ` Mario Smarduch
2014-07-25  6:16   ` Alexander Graf
2014-07-25  6:16     ` Alexander Graf
2014-07-25 17:45     ` Mario Smarduch
2014-07-25 17:45       ` Mario Smarduch
2014-08-11 19:12   ` Christoffer Dall
2014-08-11 19:12     ` Christoffer Dall
2014-08-12  0:16     ` Mario Smarduch
2014-08-12  0:16       ` Mario Smarduch
2014-08-12  9:32       ` Christoffer Dall
2014-08-12  9:32         ` Christoffer Dall
2014-08-12 23:17         ` Mario Smarduch
2014-08-12 23:17           ` Mario Smarduch
2014-08-12  1:36     ` Mario Smarduch
2014-08-12  1:36       ` Mario Smarduch
2014-08-12  9:36       ` Christoffer Dall
2014-08-12  9:36         ` Christoffer Dall
2014-08-12 21:08         ` Mario Smarduch
2014-08-12 21:08           ` Mario Smarduch
2014-07-25  0:56 ` [PATCH v9 3/4] arm: dirty log write protect mgmt. Moved x86, armv7 to generic, set armv8 ia64 mips powerpc s390 arch specific Mario Smarduch
2014-07-25  0:56   ` Mario Smarduch
2014-08-11 19:13   ` Christoffer Dall
2014-08-11 19:13     ` Christoffer Dall
2014-08-12  0:24     ` Mario Smarduch
2014-08-12  0:24       ` Mario Smarduch
2014-07-25  0:56 ` [PATCH v9 4/4] arm: ARMv7 dirty page logging 2nd stage page fault handling support Mario Smarduch
2014-07-25  0:56   ` Mario Smarduch
2014-08-11 19:13   ` Christoffer Dall
2014-08-11 19:13     ` Christoffer Dall
2014-08-12  1:25     ` Mario Smarduch
2014-08-12  1:25       ` Mario Smarduch
2014-08-12  9:50       ` Christoffer Dall
2014-08-12  9:50         ` Christoffer Dall
2014-08-13  1:27         ` Mario Smarduch
2014-08-13  1:27           ` Mario Smarduch
2014-08-13  7:30           ` Christoffer Dall
2014-08-13  7:30             ` Christoffer Dall
2014-08-14  1:20             ` Mario Smarduch
2014-08-14  1:20               ` Mario Smarduch
2014-08-15  0:01               ` Mario Smarduch
2014-08-15  0:01                 ` Mario Smarduch
2014-08-18 12:51               ` Christoffer Dall [this message]
2014-08-18 12:51                 ` Christoffer Dall
2014-08-18 17:42                 ` Mario Smarduch
2014-08-18 17:42                   ` Mario Smarduch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140818125105.GE19635@cbox \
    --to=christoffer.dall@linaro.org \
    --cc=agraf@suse.de \
    --cc=borntraeger@de.ibm.com \
    --cc=cornelia.huck@de.ibm.com \
    --cc=gleb@kernel.org \
    --cc=jays.lee@samsung.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=m.smarduch@samsung.com \
    --cc=marc.zyngier@arm.com \
    --cc=pbonzini@redhat.com \
    --cc=steve.capper@arm.com \
    --cc=sungjinn.chung@samsung.com \
    --cc=xiantao.zhang@intel.com \
    --cc=xiaoguangrong@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.