All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference
@ 2017-04-25  8:59 Jan Beulich
  2017-04-25 10:59 ` Tim Deegan
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2017-04-25  8:59 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Julien Grall, Jann Horn

[-- Attachment #1: Type: text/plain, Size: 2266 bytes --]

Jann's explanation of the problem:

"start situation:
 - domain A and domain B are PV domains
 - domain A and B both have currently scheduled vCPUs, and the vCPUs
   are not scheduled away
 - domain A has XSM_TARGET access to domain B
 - page X is owned by domain B and has no mappings
 - page X is zeroed

 steps:
 - domain A uses do_mmu_update() to map page X in domain A as writable
 - domain A accesses page X through the new PTE, creating a TLB entry
 - domain A removes its mapping of page X
   - type count of page X goes to 0
   - tlbflush_timestamp of page X is bumped
 - domain B maps page X as L1 pagetable
   - type of page X changes to PGT_l1_page_table
   - TLB flush is forced using domain_dirty_cpumask of domain B
   - page X is mapped as L1 pagetable in domain B

 At this point, domain B's vCPUs are guaranteed to have no
 incorrectly-typed stale TLB entries for page X, but AFAICS domain A's
 vCPUs can still have stale TLB entries that map page X as writable,
 permitting domain A to control a live pagetable of domain B."

Domain A necessarily is Dom0 (DomU-s with XSM_TARGET permission are
being created only for HVM domains, but domain B needs to be PV here),
so this is not a security issue, but nevertheless seems desirable to
correct.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Don't consider page's time stamp (relevant only for the owning
    domain).

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1266,6 +1266,18 @@ void put_page_from_l1e(l1_pgentry_t l1e,
     if ( (l1e_get_flags(l1e) & _PAGE_RW) && 
          ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner)) )
     {
+        /*
+         * Don't leave stale writable TLB entries in the unmapping domain's
+         * page tables, to prevent them allowing access to pages required to
+         * be read-only (e.g. after pg_owner changed them to page table or
+         * segment descriptor pages).
+         */
+        if ( unlikely(l1e_owner != pg_owner) )
+        {
+            perfc_incr(need_flush_tlb_flush);
+            flush_tlb_mask(l1e_owner->domain_dirty_cpumask);
+        }
+
         put_page_and_type(page);
     }
     else




[-- Attachment #2: x86-put-l1e-foreign-flush.patch --]
[-- Type: text/plain, Size: 2331 bytes --]

x86/mm: also flush TLB when putting writable foreign page reference

Jann's explanation of the problem:

"start situation:
 - domain A and domain B are PV domains
 - domain A and B both have currently scheduled vCPUs, and the vCPUs
   are not scheduled away
 - domain A has XSM_TARGET access to domain B
 - page X is owned by domain B and has no mappings
 - page X is zeroed

 steps:
 - domain A uses do_mmu_update() to map page X in domain A as writable
 - domain A accesses page X through the new PTE, creating a TLB entry
 - domain A removes its mapping of page X
   - type count of page X goes to 0
   - tlbflush_timestamp of page X is bumped
 - domain B maps page X as L1 pagetable
   - type of page X changes to PGT_l1_page_table
   - TLB flush is forced using domain_dirty_cpumask of domain B
   - page X is mapped as L1 pagetable in domain B

 At this point, domain B's vCPUs are guaranteed to have no
 incorrectly-typed stale TLB entries for page X, but AFAICS domain A's
 vCPUs can still have stale TLB entries that map page X as writable,
 permitting domain A to control a live pagetable of domain B."

Domain A necessarily is Dom0 (DomU-s with XSM_TARGET permission are
being created only for HVM domains, but domain B needs to be PV here),
so this is not a security issue, but nevertheless seems desirable to
correct.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Don't consider page's time stamp (relevant only for the owning
    domain).

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1266,6 +1266,18 @@ void put_page_from_l1e(l1_pgentry_t l1e,
     if ( (l1e_get_flags(l1e) & _PAGE_RW) && 
          ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner)) )
     {
+        /*
+         * Don't leave stale writable TLB entries in the unmapping domain's
+         * page tables, to prevent them allowing access to pages required to
+         * be read-only (e.g. after pg_owner changed them to page table or
+         * segment descriptor pages).
+         */
+        if ( unlikely(l1e_owner != pg_owner) )
+        {
+            perfc_incr(need_flush_tlb_flush);
+            flush_tlb_mask(l1e_owner->domain_dirty_cpumask);
+        }
+
         put_page_and_type(page);
     }
     else

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference
  2017-04-25  8:59 [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference Jan Beulich
@ 2017-04-25 10:59 ` Tim Deegan
  2017-04-25 11:59   ` Jan Beulich
  2017-04-26 14:07   ` Jan Beulich
  0 siblings, 2 replies; 18+ messages in thread
From: Tim Deegan @ 2017-04-25 10:59 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Julien Grall, Jann Horn, Andrew Cooper

Hi,

At 02:59 -0600 on 25 Apr (1493089158), Jan Beulich wrote:
> Jann's explanation of the problem:
> 
> "start situation:
>  - domain A and domain B are PV domains
>  - domain A and B both have currently scheduled vCPUs, and the vCPUs
>    are not scheduled away
>  - domain A has XSM_TARGET access to domain B
>  - page X is owned by domain B and has no mappings
>  - page X is zeroed
> 
>  steps:
>  - domain A uses do_mmu_update() to map page X in domain A as writable
>  - domain A accesses page X through the new PTE, creating a TLB entry
>  - domain A removes its mapping of page X
>    - type count of page X goes to 0
>    - tlbflush_timestamp of page X is bumped
>  - domain B maps page X as L1 pagetable
>    - type of page X changes to PGT_l1_page_table
>    - TLB flush is forced using domain_dirty_cpumask of domain B
>    - page X is mapped as L1 pagetable in domain B
> 
>  At this point, domain B's vCPUs are guaranteed to have no
>  incorrectly-typed stale TLB entries for page X, but AFAICS domain A's
>  vCPUs can still have stale TLB entries that map page X as writable,
>  permitting domain A to control a live pagetable of domain B."

AIUI this patch solves the problem by immediately flushing domain A's
TLB entries at the point where domain A removes its mapping of page X.

Could we, instead, bitwise OR domain A's domain_dirty_cpumask into
domain B's domain_dirty_cpumask at the same point?

Then when domain B flushes TLBs in the last step (in __get_page_type())
it will catch any stale TLB entries from domain A as well.  But in the
(hopefully common) case where there's a delay between domain A's
__put_page_type() and domain B's __get_page_type(), the usual TLB
timestamp filtering will suppress some of the IPIs/flushes.

Cheers,

Tim.


> Domain A necessarily is Dom0 (DomU-s with XSM_TARGET permission are
> being created only for HVM domains, but domain B needs to be PV here),
> so this is not a security issue, but nevertheless seems desirable to
> correct.
> 
> Reported-by: Jann Horn <jannh@google.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Don't consider page's time stamp (relevant only for the owning
>     domain).
> 
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -1266,6 +1266,18 @@ void put_page_from_l1e(l1_pgentry_t l1e,
>      if ( (l1e_get_flags(l1e) & _PAGE_RW) && 
>           ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner)) )
>      {
> +        /*
> +         * Don't leave stale writable TLB entries in the unmapping domain's
> +         * page tables, to prevent them allowing access to pages required to
> +         * be read-only (e.g. after pg_owner changed them to page table or
> +         * segment descriptor pages).
> +         */
> +        if ( unlikely(l1e_owner != pg_owner) )
> +        {
> +            perfc_incr(need_flush_tlb_flush);
> +            flush_tlb_mask(l1e_owner->domain_dirty_cpumask);
> +        }
> +
>          put_page_and_type(page);
>      }
>      else

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference
  2017-04-25 10:59 ` Tim Deegan
@ 2017-04-25 11:59   ` Jan Beulich
  2017-04-26  8:44     ` Tim Deegan
  2017-04-26 14:07   ` Jan Beulich
  1 sibling, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2017-04-25 11:59 UTC (permalink / raw)
  To: Jann Horn, Tim Deegan; +Cc: Andrew Cooper, Julien Grall, xen-devel

>>> On 25.04.17 at 12:59, <tim@xen.org> wrote:
> Hi,
> 
> At 02:59 -0600 on 25 Apr (1493089158), Jan Beulich wrote:
>> Jann's explanation of the problem:
>> 
>> "start situation:
>>  - domain A and domain B are PV domains
>>  - domain A and B both have currently scheduled vCPUs, and the vCPUs
>>    are not scheduled away
>>  - domain A has XSM_TARGET access to domain B
>>  - page X is owned by domain B and has no mappings
>>  - page X is zeroed
>> 
>>  steps:
>>  - domain A uses do_mmu_update() to map page X in domain A as writable
>>  - domain A accesses page X through the new PTE, creating a TLB entry
>>  - domain A removes its mapping of page X
>>    - type count of page X goes to 0
>>    - tlbflush_timestamp of page X is bumped
>>  - domain B maps page X as L1 pagetable
>>    - type of page X changes to PGT_l1_page_table
>>    - TLB flush is forced using domain_dirty_cpumask of domain B
>>    - page X is mapped as L1 pagetable in domain B
>> 
>>  At this point, domain B's vCPUs are guaranteed to have no
>>  incorrectly-typed stale TLB entries for page X, but AFAICS domain A's
>>  vCPUs can still have stale TLB entries that map page X as writable,
>>  permitting domain A to control a live pagetable of domain B."
> 
> AIUI this patch solves the problem by immediately flushing domain A's
> TLB entries at the point where domain A removes its mapping of page X.
> 
> Could we, instead, bitwise OR domain A's domain_dirty_cpumask into
> domain B's domain_dirty_cpumask at the same point?
> 
> Then when domain B flushes TLBs in the last step (in __get_page_type())
> it will catch any stale TLB entries from domain A as well.  But in the
> (hopefully common) case where there's a delay between domain A's
> __put_page_type() and domain B's __get_page_type(), the usual TLB
> timestamp filtering will suppress some of the IPIs/flushes.

Oh, I see. Yes, I think this would be fine. However, we don't have
a suitable cpumask accessor allowing us to do this ORing atomically,
so we'd have to open code it. Do you think such a slightly ugly
approach would be worth it here? Foreign mappings shouldn't be
_that_ performance critical...

And then, considering that this will result in time stamp based filtering
again, I'm no longer sure I was right to agree with Jann on the flush
here needing to be unconditional. Regardless of page table owner
matching page owner, the time stamp stored for the page will always
be applicable (it's a global property). So we wouldn't even need to
OR in the whole dirty mask here, but could already pre-filter (or if we
stayed with the flush-on-put approach, then v1 would have been
correct).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference
  2017-04-25 11:59   ` Jan Beulich
@ 2017-04-26  8:44     ` Tim Deegan
  2017-04-26  9:01       ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Tim Deegan @ 2017-04-26  8:44 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Julien Grall, xen-devel, Jann Horn

At 05:59 -0600 on 25 Apr (1493099950), Jan Beulich wrote:
> >>> On 25.04.17 at 12:59, <tim@xen.org> wrote:
> > Hi,
> > 
> > At 02:59 -0600 on 25 Apr (1493089158), Jan Beulich wrote:
> >> Jann's explanation of the problem:
> >> 
> >> "start situation:
> >>  - domain A and domain B are PV domains
> >>  - domain A and B both have currently scheduled vCPUs, and the vCPUs
> >>    are not scheduled away
> >>  - domain A has XSM_TARGET access to domain B
> >>  - page X is owned by domain B and has no mappings
> >>  - page X is zeroed
> >> 
> >>  steps:
> >>  - domain A uses do_mmu_update() to map page X in domain A as writable
> >>  - domain A accesses page X through the new PTE, creating a TLB entry
> >>  - domain A removes its mapping of page X
> >>    - type count of page X goes to 0
> >>    - tlbflush_timestamp of page X is bumped
> >>  - domain B maps page X as L1 pagetable
> >>    - type of page X changes to PGT_l1_page_table
> >>    - TLB flush is forced using domain_dirty_cpumask of domain B
> >>    - page X is mapped as L1 pagetable in domain B
> >> 
> >>  At this point, domain B's vCPUs are guaranteed to have no
> >>  incorrectly-typed stale TLB entries for page X, but AFAICS domain A's
> >>  vCPUs can still have stale TLB entries that map page X as writable,
> >>  permitting domain A to control a live pagetable of domain B."
> > 
> > AIUI this patch solves the problem by immediately flushing domain A's
> > TLB entries at the point where domain A removes its mapping of page X.
> > 
> > Could we, instead, bitwise OR domain A's domain_dirty_cpumask into
> > domain B's domain_dirty_cpumask at the same point?
> > 
> > Then when domain B flushes TLBs in the last step (in __get_page_type())
> > it will catch any stale TLB entries from domain A as well.  But in the
> > (hopefully common) case where there's a delay between domain A's
> > __put_page_type() and domain B's __get_page_type(), the usual TLB
> > timestamp filtering will suppress some of the IPIs/flushes.
> 
> Oh, I see. Yes, I think this would be fine. However, we don't have
> a suitable cpumask accessor allowing us to do this ORing atomically,
> so we'd have to open code it.

Probably better to build the accessor than to open code here. :)

> Do you think such a slightly ugly approach would be worth it here?
> Foreign mappings shouldn't be _that_ performance critical..

I have no real idea, though there are quite a lot of them in domain
building/migration.  I can imagine a busy multi-vcpu dom0 could
generate a lot of IPIs, almost all of which could be merged.

> And then, considering that this will result in time stamp based filtering
> again, I'm no longer sure I was right to agree with Jann on the flush
> here needing to be unconditional. Regardless of page table owner
> matching page owner, the time stamp stored for the page will always
> be applicable (it's a global property). So we wouldn't even need to
> OR in the whole dirty mask here, but could already pre-filter (or if we
> stayed with the flush-on-put approach, then v1 would have been
> correct).

I don't think so.  The page's timestamp is set when its typecount
falls to zero, which hasn't happened yet -- we hold a typecount
ourselves here.

In theory we could filter the bits we're adding against a local
timestamp, but that would have to be tlbflush_current_time()
because the TLB entries we care about are live right now, and
filtering against that is (probably) a noop.

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference
  2017-04-26  8:44     ` Tim Deegan
@ 2017-04-26  9:01       ` Jan Beulich
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2017-04-26  9:01 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Andrew Cooper, Julien Grall, xen-devel, Jann Horn

>>> On 26.04.17 at 10:44, <tim@xen.org> wrote:
> At 05:59 -0600 on 25 Apr (1493099950), Jan Beulich wrote:
>> >>> On 25.04.17 at 12:59, <tim@xen.org> wrote:
>> > Hi,
>> > 
>> > At 02:59 -0600 on 25 Apr (1493089158), Jan Beulich wrote:
>> >> Jann's explanation of the problem:
>> >> 
>> >> "start situation:
>> >>  - domain A and domain B are PV domains
>> >>  - domain A and B both have currently scheduled vCPUs, and the vCPUs
>> >>    are not scheduled away
>> >>  - domain A has XSM_TARGET access to domain B
>> >>  - page X is owned by domain B and has no mappings
>> >>  - page X is zeroed
>> >> 
>> >>  steps:
>> >>  - domain A uses do_mmu_update() to map page X in domain A as writable
>> >>  - domain A accesses page X through the new PTE, creating a TLB entry
>> >>  - domain A removes its mapping of page X
>> >>    - type count of page X goes to 0
>> >>    - tlbflush_timestamp of page X is bumped
>> >>  - domain B maps page X as L1 pagetable
>> >>    - type of page X changes to PGT_l1_page_table
>> >>    - TLB flush is forced using domain_dirty_cpumask of domain B
>> >>    - page X is mapped as L1 pagetable in domain B
>> >> 
>> >>  At this point, domain B's vCPUs are guaranteed to have no
>> >>  incorrectly-typed stale TLB entries for page X, but AFAICS domain A's
>> >>  vCPUs can still have stale TLB entries that map page X as writable,
>> >>  permitting domain A to control a live pagetable of domain B."
>> > 
>> > AIUI this patch solves the problem by immediately flushing domain A's
>> > TLB entries at the point where domain A removes its mapping of page X.
>> > 
>> > Could we, instead, bitwise OR domain A's domain_dirty_cpumask into
>> > domain B's domain_dirty_cpumask at the same point?
>> > 
>> > Then when domain B flushes TLBs in the last step (in __get_page_type())
>> > it will catch any stale TLB entries from domain A as well.  But in the
>> > (hopefully common) case where there's a delay between domain A's
>> > __put_page_type() and domain B's __get_page_type(), the usual TLB
>> > timestamp filtering will suppress some of the IPIs/flushes.
>> 
>> Oh, I see. Yes, I think this would be fine. However, we don't have
>> a suitable cpumask accessor allowing us to do this ORing atomically,
>> so we'd have to open code it.
> 
> Probably better to build the accessor than to open code here. :)

Hmm, that would mean building a whole group of accessors (as
I wouldn't want to introduce an atomic OR one without any of
the others which exist as non-atomic ones). Plus there would be
the question of how to name them - the current inconsistency
(single bit operations being atomic unless prefixed by two
underscores, while multi-bit operations are non-atomic despite
their lack of leading underscores) doesn't really help here.

IOW the original question wasn't really whether to introduce
accessors, but whether the approach you suggest is worthwhile
despite the lack of such accessors. Which I think you ...

>> Do you think such a slightly ugly approach would be worth it here?
>> Foreign mappings shouldn't be _that_ performance critical..
> 
> I have no real idea, though there are quite a lot of them in domain
> building/migration.  I can imagine a busy multi-vcpu dom0 could
> generate a lot of IPIs, almost all of which could be merged.

... believe it would be.

>> And then, considering that this will result in time stamp based filtering
>> again, I'm no longer sure I was right to agree with Jann on the flush
>> here needing to be unconditional. Regardless of page table owner
>> matching page owner, the time stamp stored for the page will always
>> be applicable (it's a global property). So we wouldn't even need to
>> OR in the whole dirty mask here, but could already pre-filter (or if we
>> stayed with the flush-on-put approach, then v1 would have been
>> correct).
> 
> I don't think so.  The page's timestamp is set when its typecount
> falls to zero, which hasn't happened yet -- we hold a typecount
> ourselves here.
> 
> In theory we could filter the bits we're adding against a local
> timestamp, but that would have to be tlbflush_current_time()
> because the TLB entries we care about are live right now, and
> filtering against that is (probably) a noop.

Good point. I guess I'll switch to the open coded merging approach
then for v3.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference
  2017-04-25 10:59 ` Tim Deegan
  2017-04-25 11:59   ` Jan Beulich
@ 2017-04-26 14:07   ` Jan Beulich
  2017-04-26 14:25     ` Tim Deegan
  1 sibling, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2017-04-26 14:07 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Andrew Cooper, Julien Grall, xen-devel, Jann Horn

>>> On 25.04.17 at 12:59, <tim@xen.org> wrote:
> Hi,
> 
> At 02:59 -0600 on 25 Apr (1493089158), Jan Beulich wrote:
>> Jann's explanation of the problem:
>> 
>> "start situation:
>>  - domain A and domain B are PV domains
>>  - domain A and B both have currently scheduled vCPUs, and the vCPUs
>>    are not scheduled away
>>  - domain A has XSM_TARGET access to domain B
>>  - page X is owned by domain B and has no mappings
>>  - page X is zeroed
>> 
>>  steps:
>>  - domain A uses do_mmu_update() to map page X in domain A as writable
>>  - domain A accesses page X through the new PTE, creating a TLB entry
>>  - domain A removes its mapping of page X
>>    - type count of page X goes to 0
>>    - tlbflush_timestamp of page X is bumped
>>  - domain B maps page X as L1 pagetable
>>    - type of page X changes to PGT_l1_page_table
>>    - TLB flush is forced using domain_dirty_cpumask of domain B
>>    - page X is mapped as L1 pagetable in domain B
>> 
>>  At this point, domain B's vCPUs are guaranteed to have no
>>  incorrectly-typed stale TLB entries for page X, but AFAICS domain A's
>>  vCPUs can still have stale TLB entries that map page X as writable,
>>  permitting domain A to control a live pagetable of domain B."
> 
> AIUI this patch solves the problem by immediately flushing domain A's
> TLB entries at the point where domain A removes its mapping of page X.
> 
> Could we, instead, bitwise OR domain A's domain_dirty_cpumask into
> domain B's domain_dirty_cpumask at the same point?
> 
> Then when domain B flushes TLBs in the last step (in __get_page_type())
> it will catch any stale TLB entries from domain A as well.  But in the
> (hopefully common) case where there's a delay between domain A's
> __put_page_type() and domain B's __get_page_type(), the usual TLB
> timestamp filtering will suppress some of the IPIs/flushes.

So I've given this a try, and failed miserably (including losing an
XFS volume on the test machine). The problem is the BUG_ON() at
the top of domain_relinquish_resources() - there will, very likely, be
bits remaining set if the code added to put_page_from_l1e() set
some pretty recently (irrespective of avoiding to set any once
->is_dying has been set).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference
  2017-04-26 14:07   ` Jan Beulich
@ 2017-04-26 14:25     ` Tim Deegan
  2017-04-27  9:23       ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Tim Deegan @ 2017-04-26 14:25 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Julien Grall, xen-devel, Jann Horn

At 08:07 -0600 on 26 Apr (1493194043), Jan Beulich wrote:
> >>> On 25.04.17 at 12:59, <tim@xen.org> wrote:
> > Hi,
> > 
> > At 02:59 -0600 on 25 Apr (1493089158), Jan Beulich wrote:
> >> Jann's explanation of the problem:
> >> 
> >> "start situation:
> >>  - domain A and domain B are PV domains
> >>  - domain A and B both have currently scheduled vCPUs, and the vCPUs
> >>    are not scheduled away
> >>  - domain A has XSM_TARGET access to domain B
> >>  - page X is owned by domain B and has no mappings
> >>  - page X is zeroed
> >> 
> >>  steps:
> >>  - domain A uses do_mmu_update() to map page X in domain A as writable
> >>  - domain A accesses page X through the new PTE, creating a TLB entry
> >>  - domain A removes its mapping of page X
> >>    - type count of page X goes to 0
> >>    - tlbflush_timestamp of page X is bumped
> >>  - domain B maps page X as L1 pagetable
> >>    - type of page X changes to PGT_l1_page_table
> >>    - TLB flush is forced using domain_dirty_cpumask of domain B
> >>    - page X is mapped as L1 pagetable in domain B
> >> 
> >>  At this point, domain B's vCPUs are guaranteed to have no
> >>  incorrectly-typed stale TLB entries for page X, but AFAICS domain A's
> >>  vCPUs can still have stale TLB entries that map page X as writable,
> >>  permitting domain A to control a live pagetable of domain B."
> > 
> > AIUI this patch solves the problem by immediately flushing domain A's
> > TLB entries at the point where domain A removes its mapping of page X.
> > 
> > Could we, instead, bitwise OR domain A's domain_dirty_cpumask into
> > domain B's domain_dirty_cpumask at the same point?
> > 
> > Then when domain B flushes TLBs in the last step (in __get_page_type())
> > it will catch any stale TLB entries from domain A as well.  But in the
> > (hopefully common) case where there's a delay between domain A's
> > __put_page_type() and domain B's __get_page_type(), the usual TLB
> > timestamp filtering will suppress some of the IPIs/flushes.
> 
> So I've given this a try, and failed miserably (including losing an
> XFS volume on the test machine). The problem is the BUG_ON() at
> the top of domain_relinquish_resources() - there will, very likely, be
> bits remaining set if the code added to put_page_from_l1e() set
> some pretty recently (irrespective of avoiding to set any once
> ->is_dying has been set).

Yeah. :(  Would it be correct to just remove that BUG_ON(), or replace it
with an explicit check that there are no running vcpus?

Or is using domain_dirty_cpumask like this too much of a stretch?
E.g. PV TLB flushes use it, and would maybe be more expensive until
the dom0 CPUs fall out of the mask (which isn't guaranteed to happen).

We could add a new mask just for this case, and clear CPUs from it as
they're flushed.  But that sounds like a lot of work...

Maybe worth measuring the impact of the current patch before going too
far with this?

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference
  2017-04-26 14:25     ` Tim Deegan
@ 2017-04-27  9:23       ` Jan Beulich
  2017-04-27  9:51         ` Tim Deegan
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2017-04-27  9:23 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Andrew Cooper, Julien Grall, xen-devel, Jann Horn

>>> On 26.04.17 at 16:25, <tim@xen.org> wrote:
> At 08:07 -0600 on 26 Apr (1493194043), Jan Beulich wrote:
>> >>> On 25.04.17 at 12:59, <tim@xen.org> wrote:
>> > Hi,
>> > 
>> > At 02:59 -0600 on 25 Apr (1493089158), Jan Beulich wrote:
>> >> Jann's explanation of the problem:
>> >> 
>> >> "start situation:
>> >>  - domain A and domain B are PV domains
>> >>  - domain A and B both have currently scheduled vCPUs, and the vCPUs
>> >>    are not scheduled away
>> >>  - domain A has XSM_TARGET access to domain B
>> >>  - page X is owned by domain B and has no mappings
>> >>  - page X is zeroed
>> >> 
>> >>  steps:
>> >>  - domain A uses do_mmu_update() to map page X in domain A as writable
>> >>  - domain A accesses page X through the new PTE, creating a TLB entry
>> >>  - domain A removes its mapping of page X
>> >>    - type count of page X goes to 0
>> >>    - tlbflush_timestamp of page X is bumped
>> >>  - domain B maps page X as L1 pagetable
>> >>    - type of page X changes to PGT_l1_page_table
>> >>    - TLB flush is forced using domain_dirty_cpumask of domain B
>> >>    - page X is mapped as L1 pagetable in domain B
>> >> 
>> >>  At this point, domain B's vCPUs are guaranteed to have no
>> >>  incorrectly-typed stale TLB entries for page X, but AFAICS domain A's
>> >>  vCPUs can still have stale TLB entries that map page X as writable,
>> >>  permitting domain A to control a live pagetable of domain B."
>> > 
>> > AIUI this patch solves the problem by immediately flushing domain A's
>> > TLB entries at the point where domain A removes its mapping of page X.
>> > 
>> > Could we, instead, bitwise OR domain A's domain_dirty_cpumask into
>> > domain B's domain_dirty_cpumask at the same point?
>> > 
>> > Then when domain B flushes TLBs in the last step (in __get_page_type())
>> > it will catch any stale TLB entries from domain A as well.  But in the
>> > (hopefully common) case where there's a delay between domain A's
>> > __put_page_type() and domain B's __get_page_type(), the usual TLB
>> > timestamp filtering will suppress some of the IPIs/flushes.
>> 
>> So I've given this a try, and failed miserably (including losing an
>> XFS volume on the test machine). The problem is the BUG_ON() at
>> the top of domain_relinquish_resources() - there will, very likely, be
>> bits remaining set if the code added to put_page_from_l1e() set
>> some pretty recently (irrespective of avoiding to set any once
>> ->is_dying has been set).
> 
> Yeah. :(  Would it be correct to just remove that BUG_ON(), or replace it
> with an explicit check that there are no running vcpus?
> 
> Or is using domain_dirty_cpumask like this too much of a stretch?
> E.g. PV TLB flushes use it, and would maybe be more expensive until
> the dom0 CPUs fall out of the mask (which isn't guaranteed to happen).

Right, since effectively some of the bits may never clear (if
pg_owner never runs on some pCPU l1e_owner does run on), I
now think this model could even have introduced more overhead
than the immediate flushing.

> We could add a new mask just for this case, and clear CPUs from it as
> they're flushed.  But that sounds like a lot of work...

Wouldn't it suffice to set bits in this mask in put_page_from_l1e()
and consume/clear them in __get_page_type()? Right now I can't
see it being necessary for correctness to fiddle with any of the
other flushes using the domain dirty mask.

But then again this may not be much of a win, unless the put
operations come through in meaningful batches, not interleaved
by any type changes (the latter ought to be guaranteed during
domain construction and teardown at least, as the guest itself
can't do anything at that time to effect type changes). Hence I
wonder whether ...

> Maybe worth measuring the impact of the current patch before going too
> far with this?

... it wouldn't better be the other way around: We use the patch
in its current (or even v1) form, and try to do something about
performance only if we really find a case where it matters. To be
honest, I'm not even sure how I could meaningfully measure the
impact here: Simply counting how many extra flushes there would
end up being wouldn't seem all that useful, and whether there
would be any measurable difference in the overall execution time
of e.g. domain creation I would highly doubt (but if it's that what
you're after, I could certainly collect a few numbers).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference
  2017-04-27  9:23       ` Jan Beulich
@ 2017-04-27  9:51         ` Tim Deegan
  2017-04-28 10:52           ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Tim Deegan @ 2017-04-27  9:51 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Julien Grall, xen-devel, Jann Horn

At 03:23 -0600 on 27 Apr (1493263380), Jan Beulich wrote:
> ... it wouldn't better be the other way around: We use the patch
> in its current (or even v1) form, and try to do something about
> performance only if we really find a case where it matters. To be
> honest, I'm not even sure how I could meaningfully measure the
> impact here: Simply counting how many extra flushes there would
> end up being wouldn't seem all that useful, and whether there
> would be any measurable difference in the overall execution time
> of e.g. domain creation I would highly doubt (but if it's that what
> you're after, I could certainly collect a few numbers).

I think that would be a good idea, just as a sanity-check.  But apart
from that the patch looks correct to me, so:

Reviewed-by: Tim Deegan <tim@xen.org>

for v2 (not v1).

Cheers,

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference
  2017-04-27  9:51         ` Tim Deegan
@ 2017-04-28 10:52           ` Jan Beulich
  2017-05-02  8:32             ` Tim Deegan
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2017-04-28 10:52 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Andrew Cooper, Julien Grall, xen-devel, Jann Horn

>>> On 27.04.17 at 11:51, <tim@xen.org> wrote:
> At 03:23 -0600 on 27 Apr (1493263380), Jan Beulich wrote:
>> ... it wouldn't better be the other way around: We use the patch
>> in its current (or even v1) form, and try to do something about
>> performance only if we really find a case where it matters. To be
>> honest, I'm not even sure how I could meaningfully measure the
>> impact here: Simply counting how many extra flushes there would
>> end up being wouldn't seem all that useful, and whether there
>> would be any measurable difference in the overall execution time
>> of e.g. domain creation I would highly doubt (but if it's that what
>> you're after, I could certainly collect a few numbers).
> 
> I think that would be a good idea, just as a sanity-check.

As it turns out there is a measurable effect: xc_dom_boot_image()
for a 4Gb PV guest takes about 70% longer now. Otoh it is itself
responsible for less than 10% of the overall time libxl__build_dom()
takes, and that in turn is only a pretty small portion of the overall
"xl create".

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference
  2017-04-28 10:52           ` Jan Beulich
@ 2017-05-02  8:32             ` Tim Deegan
  2017-05-02  8:50               ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Tim Deegan @ 2017-05-02  8:32 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Julien Grall, xen-devel, Jann Horn

At 04:52 -0600 on 28 Apr (1493355160), Jan Beulich wrote:
> >>> On 27.04.17 at 11:51, <tim@xen.org> wrote:
> > At 03:23 -0600 on 27 Apr (1493263380), Jan Beulich wrote:
> >> ... it wouldn't better be the other way around: We use the patch
> >> in its current (or even v1) form, and try to do something about
> >> performance only if we really find a case where it matters. To be
> >> honest, I'm not even sure how I could meaningfully measure the
> >> impact here: Simply counting how many extra flushes there would
> >> end up being wouldn't seem all that useful, and whether there
> >> would be any measurable difference in the overall execution time
> >> of e.g. domain creation I would highly doubt (but if it's that what
> >> you're after, I could certainly collect a few numbers).
> > 
> > I think that would be a good idea, just as a sanity-check.
> 
> As it turns out there is a measurable effect: xc_dom_boot_image()
> for a 4Gb PV guest takes about 70% longer now. Otoh it is itself
> responsible for less than 10% of the overall time libxl__build_dom()
> takes, and that in turn is only a pretty small portion of the overall
> "xl create".

Do you think that slowdown is OK?  I'm not sure -- I'd be inclined to
avoid it, but could be persuaded, and it's not me doing the work. :)
Andrew, what do you think?

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference
  2017-05-02  8:32             ` Tim Deegan
@ 2017-05-02  8:50               ` Jan Beulich
  2017-05-02  9:43                 ` Tim Deegan
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2017-05-02  8:50 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Andrew Cooper, Julien Grall, xen-devel, Jann Horn

>>> On 02.05.17 at 10:32, <tim@xen.org> wrote:
> At 04:52 -0600 on 28 Apr (1493355160), Jan Beulich wrote:
>> >>> On 27.04.17 at 11:51, <tim@xen.org> wrote:
>> > At 03:23 -0600 on 27 Apr (1493263380), Jan Beulich wrote:
>> >> ... it wouldn't better be the other way around: We use the patch
>> >> in its current (or even v1) form, and try to do something about
>> >> performance only if we really find a case where it matters. To be
>> >> honest, I'm not even sure how I could meaningfully measure the
>> >> impact here: Simply counting how many extra flushes there would
>> >> end up being wouldn't seem all that useful, and whether there
>> >> would be any measurable difference in the overall execution time
>> >> of e.g. domain creation I would highly doubt (but if it's that what
>> >> you're after, I could certainly collect a few numbers).
>> > 
>> > I think that would be a good idea, just as a sanity-check.
>> 
>> As it turns out there is a measurable effect: xc_dom_boot_image()
>> for a 4Gb PV guest takes about 70% longer now. Otoh it is itself
>> responsible for less than 10% of the overall time libxl__build_dom()
>> takes, and that in turn is only a pretty small portion of the overall
>> "xl create".
> 
> Do you think that slowdown is OK?  I'm not sure -- I'd be inclined to
> avoid it, but could be persuaded, and it's not me doing the work. :)

Well, if there was a way to avoid it in a clean way without too much
code churn, I'd be all for avoiding it. The avenues we've explored so
far either didn't work (using pg_owner's dirty mask) or didn't promise
to actually reduce the flush overhead in a meaningful way (adding a
separate mask to be merged into the mask used for the flush in
__get_page_type()), unless - as has been the case before - I didn't
fully understand your thoughts there.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference
  2017-05-02  8:50               ` Jan Beulich
@ 2017-05-02  9:43                 ` Tim Deegan
  2017-05-02 17:37                   ` Andrew Cooper
  0 siblings, 1 reply; 18+ messages in thread
From: Tim Deegan @ 2017-05-02  9:43 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Julien Grall, xen-devel, Jann Horn

At 02:50 -0600 on 02 May (1493693403), Jan Beulich wrote:
> >>> On 02.05.17 at 10:32, <tim@xen.org> wrote:
> > At 04:52 -0600 on 28 Apr (1493355160), Jan Beulich wrote:
> >> >>> On 27.04.17 at 11:51, <tim@xen.org> wrote:
> >> > At 03:23 -0600 on 27 Apr (1493263380), Jan Beulich wrote:
> >> >> ... it wouldn't better be the other way around: We use the patch
> >> >> in its current (or even v1) form, and try to do something about
> >> >> performance only if we really find a case where it matters. To be
> >> >> honest, I'm not even sure how I could meaningfully measure the
> >> >> impact here: Simply counting how many extra flushes there would
> >> >> end up being wouldn't seem all that useful, and whether there
> >> >> would be any measurable difference in the overall execution time
> >> >> of e.g. domain creation I would highly doubt (but if it's that what
> >> >> you're after, I could certainly collect a few numbers).
> >> > 
> >> > I think that would be a good idea, just as a sanity-check.
> >> 
> >> As it turns out there is a measurable effect: xc_dom_boot_image()
> >> for a 4Gb PV guest takes about 70% longer now. Otoh it is itself
> >> responsible for less than 10% of the overall time libxl__build_dom()
> >> takes, and that in turn is only a pretty small portion of the overall
> >> "xl create".
> > 
> > Do you think that slowdown is OK?  I'm not sure -- I'd be inclined to
> > avoid it, but could be persuaded, and it's not me doing the work. :)
> 
> Well, if there was a way to avoid it in a clean way without too much
> code churn, I'd be all for avoiding it. The avenues we've explored so
> far either didn't work (using pg_owner's dirty mask) or didn't promise
> to actually reduce the flush overhead in a meaningful way (adding a
> separate mask to be merged into the mask used for the flush in
> __get_page_type()), unless - as has been the case before - I didn't
> fully understand your thoughts there.

Quoting your earlier response:

> Wouldn't it suffice to set bits in this mask in put_page_from_l1e()
> and consume/clear them in __get_page_type()? Right now I can't
> see it being necessary for correctness to fiddle with any of the
> other flushes using the domain dirty mask.
> 
> But then again this may not be much of a win, unless the put
> operations come through in meaningful batches, not interleaved
> by any type changes (the latter ought to be guaranteed during
> domain construction and teardown at least, as the guest itself
> can't do anything at that time to effect type changes).

I'm not sure how much batching there needs to be.  I agree that the
domain creation case should work well though.  Let me think about the
scenarios when dom B is live:

1. Dom A drops its foreign map of page X; dom B immediately changes the
type of page X.  This case isn't helped at all, but I don't see any
way to improve it -- dom A's TLBs need to be flushed right away.

2. Dom A drops its foreign map of page X; dom B immediately changes
the type of page Y.  Now dom A's dirty CPUs are in the new map, but B
may not need to flush them right away.  B can filter by page Y's
timestamp, and flush (and clear) only some of the cpus in the map.

So that seems good, but then there's a risk that cpus never get
cleared from the map, and __get_page_type() ends up doing a lot of
unnecessary work filtering timestaps.  When is it safe to remove a CPU
from that map?
 - obvs safe if we IPI it to flush the TLB (though may need memory
   barriers -- need to think about a race with CPU C putting A _into_
   the map at the same time...)
 - we could track the timestamp of the most recent addition to the
   map, and drop any CPU whose TLB has been flushed since that,
   but that still lets unrelated unmaps keep CPUs alive in the map...
 - we could double-buffer the map: always add CPUs to the active map;
   from time to time, swap maps and flush everything in the non-active
   map (filtered by the TLB timestamp when we last swapped over).

Bah, this is turning into a tar pit.  Let's stick to the v2 patch as
being (relatively) simple and correct, and revisit this if it causes
trouble. :)

Thanks,

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference
  2017-05-02  9:43                 ` Tim Deegan
@ 2017-05-02 17:37                   ` Andrew Cooper
  2017-05-03  7:21                     ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Cooper @ 2017-05-02 17:37 UTC (permalink / raw)
  To: Tim Deegan, Jan Beulich; +Cc: xen-devel, Julien Grall, Jann Horn

On 02/05/17 10:43, Tim Deegan wrote:
> At 02:50 -0600 on 02 May (1493693403), Jan Beulich wrote:
>>>>> On 02.05.17 at 10:32, <tim@xen.org> wrote:
>>> At 04:52 -0600 on 28 Apr (1493355160), Jan Beulich wrote:
>>>>>>> On 27.04.17 at 11:51, <tim@xen.org> wrote:
>>>>> At 03:23 -0600 on 27 Apr (1493263380), Jan Beulich wrote:
>>>>>> ... it wouldn't better be the other way around: We use the patch
>>>>>> in its current (or even v1) form, and try to do something about
>>>>>> performance only if we really find a case where it matters. To be
>>>>>> honest, I'm not even sure how I could meaningfully measure the
>>>>>> impact here: Simply counting how many extra flushes there would
>>>>>> end up being wouldn't seem all that useful, and whether there
>>>>>> would be any measurable difference in the overall execution time
>>>>>> of e.g. domain creation I would highly doubt (but if it's that what
>>>>>> you're after, I could certainly collect a few numbers).
>>>>> I think that would be a good idea, just as a sanity-check.
>>>> As it turns out there is a measurable effect: xc_dom_boot_image()
>>>> for a 4Gb PV guest takes about 70% longer now. Otoh it is itself
>>>> responsible for less than 10% of the overall time libxl__build_dom()
>>>> takes, and that in turn is only a pretty small portion of the overall
>>>> "xl create".
>>> Do you think that slowdown is OK?  I'm not sure -- I'd be inclined to
>>> avoid it, but could be persuaded, and it's not me doing the work. :)
>> Well, if there was a way to avoid it in a clean way without too much
>> code churn, I'd be all for avoiding it. The avenues we've explored so
>> far either didn't work (using pg_owner's dirty mask) or didn't promise
>> to actually reduce the flush overhead in a meaningful way (adding a
>> separate mask to be merged into the mask used for the flush in
>> __get_page_type()), unless - as has been the case before - I didn't
>> fully understand your thoughts there.
> Quoting your earlier response:
>
>> Wouldn't it suffice to set bits in this mask in put_page_from_l1e()
>> and consume/clear them in __get_page_type()? Right now I can't
>> see it being necessary for correctness to fiddle with any of the
>> other flushes using the domain dirty mask.
>>
>> But then again this may not be much of a win, unless the put
>> operations come through in meaningful batches, not interleaved
>> by any type changes (the latter ought to be guaranteed during
>> domain construction and teardown at least, as the guest itself
>> can't do anything at that time to effect type changes).
> I'm not sure how much batching there needs to be.  I agree that the
> domain creation case should work well though.  Let me think about the
> scenarios when dom B is live:
>
> 1. Dom A drops its foreign map of page X; dom B immediately changes the
> type of page X.  This case isn't helped at all, but I don't see any
> way to improve it -- dom A's TLBs need to be flushed right away.
>
> 2. Dom A drops its foreign map of page X; dom B immediately changes
> the type of page Y.  Now dom A's dirty CPUs are in the new map, but B
> may not need to flush them right away.  B can filter by page Y's
> timestamp, and flush (and clear) only some of the cpus in the map.
>
> So that seems good, but then there's a risk that cpus never get
> cleared from the map, and __get_page_type() ends up doing a lot of
> unnecessary work filtering timestaps.  When is it safe to remove a CPU
> from that map?
>  - obvs safe if we IPI it to flush the TLB (though may need memory
>    barriers -- need to think about a race with CPU C putting A _into_
>    the map at the same time...)
>  - we could track the timestamp of the most recent addition to the
>    map, and drop any CPU whose TLB has been flushed since that,
>    but that still lets unrelated unmaps keep CPUs alive in the map...
>  - we could double-buffer the map: always add CPUs to the active map;
>    from time to time, swap maps and flush everything in the non-active
>    map (filtered by the TLB timestamp when we last swapped over).
>
> Bah, this is turning into a tar pit.  Let's stick to the v2 patch as
> being (relatively) simple and correct, and revisit this if it causes
> trouble. :)

:(

A 70% performance hit for guest creation is certainly going to cause
problems, but we obviously need to prioritise correctness in this case.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference
  2017-05-02 17:37                   ` Andrew Cooper
@ 2017-05-03  7:21                     ` Jan Beulich
  2017-05-03  9:55                       ` Andrew Cooper
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2017-05-03  7:21 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Julien Grall, Tim Deegan, Jann Horn

>>> On 02.05.17 at 19:37, <andrew.cooper3@citrix.com> wrote:
> On 02/05/17 10:43, Tim Deegan wrote:
>> At 02:50 -0600 on 02 May (1493693403), Jan Beulich wrote:
>>>>>> On 02.05.17 at 10:32, <tim@xen.org> wrote:
>>>> At 04:52 -0600 on 28 Apr (1493355160), Jan Beulich wrote:
>>>>>>>> On 27.04.17 at 11:51, <tim@xen.org> wrote:
>>>>>> At 03:23 -0600 on 27 Apr (1493263380), Jan Beulich wrote:
>>>>>>> ... it wouldn't better be the other way around: We use the patch
>>>>>>> in its current (or even v1) form, and try to do something about
>>>>>>> performance only if we really find a case where it matters. To be
>>>>>>> honest, I'm not even sure how I could meaningfully measure the
>>>>>>> impact here: Simply counting how many extra flushes there would
>>>>>>> end up being wouldn't seem all that useful, and whether there
>>>>>>> would be any measurable difference in the overall execution time
>>>>>>> of e.g. domain creation I would highly doubt (but if it's that what
>>>>>>> you're after, I could certainly collect a few numbers).
>>>>>> I think that would be a good idea, just as a sanity-check.
>>>>> As it turns out there is a measurable effect: xc_dom_boot_image()
>>>>> for a 4Gb PV guest takes about 70% longer now. Otoh it is itself
>>>>> responsible for less than 10% of the overall time libxl__build_dom()
>>>>> takes, and that in turn is only a pretty small portion of the overall
>>>>> "xl create".
>>>> Do you think that slowdown is OK?  I'm not sure -- I'd be inclined to
>>>> avoid it, but could be persuaded, and it's not me doing the work. :)
>>> Well, if there was a way to avoid it in a clean way without too much
>>> code churn, I'd be all for avoiding it. The avenues we've explored so
>>> far either didn't work (using pg_owner's dirty mask) or didn't promise
>>> to actually reduce the flush overhead in a meaningful way (adding a
>>> separate mask to be merged into the mask used for the flush in
>>> __get_page_type()), unless - as has been the case before - I didn't
>>> fully understand your thoughts there.
>> Quoting your earlier response:
>>
>>> Wouldn't it suffice to set bits in this mask in put_page_from_l1e()
>>> and consume/clear them in __get_page_type()? Right now I can't
>>> see it being necessary for correctness to fiddle with any of the
>>> other flushes using the domain dirty mask.
>>>
>>> But then again this may not be much of a win, unless the put
>>> operations come through in meaningful batches, not interleaved
>>> by any type changes (the latter ought to be guaranteed during
>>> domain construction and teardown at least, as the guest itself
>>> can't do anything at that time to effect type changes).
>> I'm not sure how much batching there needs to be.  I agree that the
>> domain creation case should work well though.  Let me think about the
>> scenarios when dom B is live:
>>
>> 1. Dom A drops its foreign map of page X; dom B immediately changes the
>> type of page X.  This case isn't helped at all, but I don't see any
>> way to improve it -- dom A's TLBs need to be flushed right away.
>>
>> 2. Dom A drops its foreign map of page X; dom B immediately changes
>> the type of page Y.  Now dom A's dirty CPUs are in the new map, but B
>> may not need to flush them right away.  B can filter by page Y's
>> timestamp, and flush (and clear) only some of the cpus in the map.
>>
>> So that seems good, but then there's a risk that cpus never get
>> cleared from the map, and __get_page_type() ends up doing a lot of
>> unnecessary work filtering timestaps.  When is it safe to remove a CPU
>> from that map?
>>  - obvs safe if we IPI it to flush the TLB (though may need memory
>>    barriers -- need to think about a race with CPU C putting A _into_
>>    the map at the same time...)
>>  - we could track the timestamp of the most recent addition to the
>>    map, and drop any CPU whose TLB has been flushed since that,
>>    but that still lets unrelated unmaps keep CPUs alive in the map...
>>  - we could double-buffer the map: always add CPUs to the active map;
>>    from time to time, swap maps and flush everything in the non-active
>>    map (filtered by the TLB timestamp when we last swapped over).
>>
>> Bah, this is turning into a tar pit.  Let's stick to the v2 patch as
>> being (relatively) simple and correct, and revisit this if it causes
>> trouble. :)
> 
> :(
> 
> A 70% performance hit for guest creation is certainly going to cause
> problems, but we obviously need to prioritise correctness in this case.

Hmm, you did understand that the 70% hit is on a specific sub-part
of the overall process, not guest creation as a whole? Anyway,
your reply is neither an ack nor a nak nor an indication of what needs
to change ...

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference
  2017-05-03  7:21                     ` Jan Beulich
@ 2017-05-03  9:55                       ` Andrew Cooper
  2017-05-05 18:16                         ` Marcus Granado
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Cooper @ 2017-05-03  9:55 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Julien Grall, Tim Deegan, Jann Horn

On 03/05/17 08:21, Jan Beulich wrote:
>>>> On 02.05.17 at 19:37, <andrew.cooper3@citrix.com> wrote:
>> On 02/05/17 10:43, Tim Deegan wrote:
>>> At 02:50 -0600 on 02 May (1493693403), Jan Beulich wrote:
>>>>>>> On 02.05.17 at 10:32, <tim@xen.org> wrote:
>>>>> At 04:52 -0600 on 28 Apr (1493355160), Jan Beulich wrote:
>>>>>>>>> On 27.04.17 at 11:51, <tim@xen.org> wrote:
>>>>>>> At 03:23 -0600 on 27 Apr (1493263380), Jan Beulich wrote:
>>>>>>>> ... it wouldn't better be the other way around: We use the patch
>>>>>>>> in its current (or even v1) form, and try to do something about
>>>>>>>> performance only if we really find a case where it matters. To be
>>>>>>>> honest, I'm not even sure how I could meaningfully measure the
>>>>>>>> impact here: Simply counting how many extra flushes there would
>>>>>>>> end up being wouldn't seem all that useful, and whether there
>>>>>>>> would be any measurable difference in the overall execution time
>>>>>>>> of e.g. domain creation I would highly doubt (but if it's that what
>>>>>>>> you're after, I could certainly collect a few numbers).
>>>>>>> I think that would be a good idea, just as a sanity-check.
>>>>>> As it turns out there is a measurable effect: xc_dom_boot_image()
>>>>>> for a 4Gb PV guest takes about 70% longer now. Otoh it is itself
>>>>>> responsible for less than 10% of the overall time libxl__build_dom()
>>>>>> takes, and that in turn is only a pretty small portion of the overall
>>>>>> "xl create".
>>>>> Do you think that slowdown is OK?  I'm not sure -- I'd be inclined to
>>>>> avoid it, but could be persuaded, and it's not me doing the work. :)
>>>> Well, if there was a way to avoid it in a clean way without too much
>>>> code churn, I'd be all for avoiding it. The avenues we've explored so
>>>> far either didn't work (using pg_owner's dirty mask) or didn't promise
>>>> to actually reduce the flush overhead in a meaningful way (adding a
>>>> separate mask to be merged into the mask used for the flush in
>>>> __get_page_type()), unless - as has been the case before - I didn't
>>>> fully understand your thoughts there.
>>> Quoting your earlier response:
>>>
>>>> Wouldn't it suffice to set bits in this mask in put_page_from_l1e()
>>>> and consume/clear them in __get_page_type()? Right now I can't
>>>> see it being necessary for correctness to fiddle with any of the
>>>> other flushes using the domain dirty mask.
>>>>
>>>> But then again this may not be much of a win, unless the put
>>>> operations come through in meaningful batches, not interleaved
>>>> by any type changes (the latter ought to be guaranteed during
>>>> domain construction and teardown at least, as the guest itself
>>>> can't do anything at that time to effect type changes).
>>> I'm not sure how much batching there needs to be.  I agree that the
>>> domain creation case should work well though.  Let me think about the
>>> scenarios when dom B is live:
>>>
>>> 1. Dom A drops its foreign map of page X; dom B immediately changes the
>>> type of page X.  This case isn't helped at all, but I don't see any
>>> way to improve it -- dom A's TLBs need to be flushed right away.
>>>
>>> 2. Dom A drops its foreign map of page X; dom B immediately changes
>>> the type of page Y.  Now dom A's dirty CPUs are in the new map, but B
>>> may not need to flush them right away.  B can filter by page Y's
>>> timestamp, and flush (and clear) only some of the cpus in the map.
>>>
>>> So that seems good, but then there's a risk that cpus never get
>>> cleared from the map, and __get_page_type() ends up doing a lot of
>>> unnecessary work filtering timestaps.  When is it safe to remove a CPU
>>> from that map?
>>>  - obvs safe if we IPI it to flush the TLB (though may need memory
>>>    barriers -- need to think about a race with CPU C putting A _into_
>>>    the map at the same time...)
>>>  - we could track the timestamp of the most recent addition to the
>>>    map, and drop any CPU whose TLB has been flushed since that,
>>>    but that still lets unrelated unmaps keep CPUs alive in the map...
>>>  - we could double-buffer the map: always add CPUs to the active map;
>>>    from time to time, swap maps and flush everything in the non-active
>>>    map (filtered by the TLB timestamp when we last swapped over).
>>>
>>> Bah, this is turning into a tar pit.  Let's stick to the v2 patch as
>>> being (relatively) simple and correct, and revisit this if it causes
>>> trouble. :)
>> :(
>>
>> A 70% performance hit for guest creation is certainly going to cause
>> problems, but we obviously need to prioritise correctness in this case.
> Hmm, you did understand that the 70% hit is on a specific sub-part
> of the overall process, not guest creation as a whole? Anyway,
> your reply is neither an ack nor a nak nor an indication of what needs
> to change ...

Yes - I realise it isn't all of domain creation, but this performance
hit will also hit migration, qemu DMA mappings, etc.

XenServer has started a side-by-side performance work-up of this change,
as presented at the root of this thread.  We should hopefully have some
number in the next day or two.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference
  2017-05-03  9:55                       ` Andrew Cooper
@ 2017-05-05 18:16                         ` Marcus Granado
  2017-05-05 18:29                           ` Andrew Cooper
  0 siblings, 1 reply; 18+ messages in thread
From: Marcus Granado @ 2017-05-05 18:16 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich
  Cc: xen-devel, Julien Grall, Tim (Xen.org), Jann Horn

On 03/05/17 10:56, Andrew Cooper wrote:
> On 03/05/17 08:21, Jan Beulich wrote:
> >>>> On 02.05.17 at 19:37, <andrew.cooper3@citrix.com> wrote:
> >> On 02/05/17 10:43, Tim Deegan wrote:
> >>> At 02:50 -0600 on 02 May (1493693403), Jan Beulich wrote:
> >>>>>>> On 02.05.17 at 10:32, <tim@xen.org> wrote:
> >>>>> At 04:52 -0600 on 28 Apr (1493355160), Jan Beulich wrote:
> >>>>>>>>> On 27.04.17 at 11:51, <tim@xen.org> wrote:
> >>>>>>> At 03:23 -0600 on 27 Apr (1493263380), Jan Beulich wrote:
> >>>>>>>> ... it wouldn't better be the other way around: We use the
> >>>>>>>> patch in its current (or even v1) form, and try to do something
> >>>>>>>> about performance only if we really find a case where it
> >>>>>>>> matters. To be honest, I'm not even sure how I could
> >>>>>>>> meaningfully measure the impact here: Simply counting how
> many
> >>>>>>>> extra flushes there would end up being wouldn't seem all that
> >>>>>>>> useful, and whether there would be any measurable difference in
> >>>>>>>> the overall execution time of e.g. domain creation I would
> >>>>>>>> highly doubt (but if it's that what you're after, I could certainly
> collect a few numbers).
> >>>>>>> I think that would be a good idea, just as a sanity-check.
> >>>>>> As it turns out there is a measurable effect: xc_dom_boot_image()
> >>>>>> for a 4Gb PV guest takes about 70% longer now. Otoh it is itself
> >>>>>> responsible for less than 10% of the overall time
> >>>>>> libxl__build_dom() takes, and that in turn is only a pretty small
> >>>>>> portion of the overall "xl create".
> >>>>> Do you think that slowdown is OK?  I'm not sure -- I'd be inclined
> >>>>> to avoid it, but could be persuaded, and it's not me doing the
> >>>>> work. :)
> >>>> Well, if there was a way to avoid it in a clean way without too
> >>>> much code churn, I'd be all for avoiding it. The avenues we've
> >>>> explored so far either didn't work (using pg_owner's dirty mask) or
> >>>> didn't promise to actually reduce the flush overhead in a
> >>>> meaningful way (adding a separate mask to be merged into the mask
> >>>> used for the flush in __get_page_type()), unless - as has been the
> >>>> case before - I didn't fully understand your thoughts there.
> >>> Quoting your earlier response:
> >>>
> >>>> Wouldn't it suffice to set bits in this mask in put_page_from_l1e()
> >>>> and consume/clear them in __get_page_type()? Right now I can't see
> >>>> it being necessary for correctness to fiddle with any of the other
> >>>> flushes using the domain dirty mask.
> >>>>
> >>>> But then again this may not be much of a win, unless the put
> >>>> operations come through in meaningful batches, not interleaved by
> >>>> any type changes (the latter ought to be guaranteed during domain
> >>>> construction and teardown at least, as the guest itself can't do
> >>>> anything at that time to effect type changes).
> >>> I'm not sure how much batching there needs to be.  I agree that the
> >>> domain creation case should work well though.  Let me think about
> >>> the scenarios when dom B is live:
> >>>
> >>> 1. Dom A drops its foreign map of page X; dom B immediately changes
> >>> the type of page X.  This case isn't helped at all, but I don't see
> >>> any way to improve it -- dom A's TLBs need to be flushed right away.
> >>>
> >>> 2. Dom A drops its foreign map of page X; dom B immediately changes
> >>> the type of page Y.  Now dom A's dirty CPUs are in the new map, but
> >>> B may not need to flush them right away.  B can filter by page Y's
> >>> timestamp, and flush (and clear) only some of the cpus in the map.
> >>>
> >>> So that seems good, but then there's a risk that cpus never get
> >>> cleared from the map, and __get_page_type() ends up doing a lot of
> >>> unnecessary work filtering timestaps.  When is it safe to remove a
> >>> CPU from that map?
> >>>  - obvs safe if we IPI it to flush the TLB (though may need memory
> >>>    barriers -- need to think about a race with CPU C putting A _into_
> >>>    the map at the same time...)
> >>>  - we could track the timestamp of the most recent addition to the
> >>>    map, and drop any CPU whose TLB has been flushed since that,
> >>>    but that still lets unrelated unmaps keep CPUs alive in the map...
> >>>  - we could double-buffer the map: always add CPUs to the active map;
> >>>    from time to time, swap maps and flush everything in the non-active
> >>>    map (filtered by the TLB timestamp when we last swapped over).
> >>>
> >>> Bah, this is turning into a tar pit.  Let's stick to the v2 patch as
> >>> being (relatively) simple and correct, and revisit this if it causes
> >>> trouble. :)
> >> :(
> >>
> >> A 70% performance hit for guest creation is certainly going to cause
> >> problems, but we obviously need to prioritise correctness in this case.
> > Hmm, you did understand that the 70% hit is on a specific sub-part of
> > the overall process, not guest creation as a whole? Anyway, your reply
> > is neither an ack nor a nak nor an indication of what needs to change
> > ...
> 
> Yes - I realise it isn't all of domain creation, but this performance hit will also
> hit migration, qemu DMA mappings, etc.
> 
> XenServer has started a side-by-side performance work-up of this change, as
> presented at the root of this thread.  We should hopefully have some
> number in the next day or two.
> 

I did some measurements on two builds of a recent version of XenServer using Xen upstream 4.9.0-3.0. The only difference between the builds was the patch x86-put-l1e-foreign-flush.patch  in https://lists.xenproject.org/archives/html/xen-devel/2017-04/msg02945.html.

I observed no measurable difference between these builds with a guest RAM value of 4G, 8G and 14G for the following operations:
- time xe vm-start
- time xe vm-shutdown
- vm downtime during "xe vm-migration" (as measured by pinging the vm during migration and verifying for how long pings would fail when both domains are paused)
- time xe vm-migrate # for HVM guests (eg. win7 and win10)

But I observed a difference for the duration of "time xe vm-migrate" for PV guests (eg. centos68, debian70, ubuntu1204). For centos68, for instance, I obtained the following values on a machine with a Intel E3-1281v3 3.7Ghz CPU, averaged over 10 runs for each data point:
|   Guest RAM   |  no patch  | with patch | difference |  diff/RAM | 
|   14GB        |   10.44s   |   13.46s   |    3.02s   |    0.22s/GB    |
|    8GB        |    6.46s   |    8.28s   |    1.82s   |    0.23s/GB    |
|    4GB        |    3.85s   |    4.74s   |    0.89s   |    0.22s/GB    |

From these numbers, if the patch is present, it looks like VM migration of a PV guest would take an extra 1s for each extra 5GB of guest RAM. The VMs are mostly idle during migration. At this point, it's not clear to me why this difference is only visible on VM migration (as opposed to VM start for example), and only on a PV guest (as opposed to an HVM).

Marcus

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference
  2017-05-05 18:16                         ` Marcus Granado
@ 2017-05-05 18:29                           ` Andrew Cooper
  0 siblings, 0 replies; 18+ messages in thread
From: Andrew Cooper @ 2017-05-05 18:29 UTC (permalink / raw)
  To: Marcus Granado, Jan Beulich
  Cc: xen-devel, Julien Grall, Tim (Xen.org), Jann Horn

On 05/05/17 19:16, Marcus Granado wrote:
> On 03/05/17 10:56, Andrew Cooper wrote:
>> On 03/05/17 08:21, Jan Beulich wrote:
>>>>>> On 02.05.17 at 19:37, <andrew.cooper3@citrix.com> wrote:
>>>> On 02/05/17 10:43, Tim Deegan wrote:
>>>>> At 02:50 -0600 on 02 May (1493693403), Jan Beulich wrote:
>>>>>>>>> On 02.05.17 at 10:32, <tim@xen.org> wrote:
>>>>>>> At 04:52 -0600 on 28 Apr (1493355160), Jan Beulich wrote:
>>>>>>>>>>> On 27.04.17 at 11:51, <tim@xen.org> wrote:
>>>>>>>>> At 03:23 -0600 on 27 Apr (1493263380), Jan Beulich wrote:
>>>>>>>>>> ... it wouldn't better be the other way around: We use the
>>>>>>>>>> patch in its current (or even v1) form, and try to do something
>>>>>>>>>> about performance only if we really find a case where it
>>>>>>>>>> matters. To be honest, I'm not even sure how I could
>>>>>>>>>> meaningfully measure the impact here: Simply counting how
>> many
>>>>>>>>>> extra flushes there would end up being wouldn't seem all that
>>>>>>>>>> useful, and whether there would be any measurable difference in
>>>>>>>>>> the overall execution time of e.g. domain creation I would
>>>>>>>>>> highly doubt (but if it's that what you're after, I could certainly
>> collect a few numbers).
>>>>>>>>> I think that would be a good idea, just as a sanity-check.
>>>>>>>> As it turns out there is a measurable effect: xc_dom_boot_image()
>>>>>>>> for a 4Gb PV guest takes about 70% longer now. Otoh it is itself
>>>>>>>> responsible for less than 10% of the overall time
>>>>>>>> libxl__build_dom() takes, and that in turn is only a pretty small
>>>>>>>> portion of the overall "xl create".
>>>>>>> Do you think that slowdown is OK?  I'm not sure -- I'd be inclined
>>>>>>> to avoid it, but could be persuaded, and it's not me doing the
>>>>>>> work. :)
>>>>>> Well, if there was a way to avoid it in a clean way without too
>>>>>> much code churn, I'd be all for avoiding it. The avenues we've
>>>>>> explored so far either didn't work (using pg_owner's dirty mask) or
>>>>>> didn't promise to actually reduce the flush overhead in a
>>>>>> meaningful way (adding a separate mask to be merged into the mask
>>>>>> used for the flush in __get_page_type()), unless - as has been the
>>>>>> case before - I didn't fully understand your thoughts there.
>>>>> Quoting your earlier response:
>>>>>
>>>>>> Wouldn't it suffice to set bits in this mask in put_page_from_l1e()
>>>>>> and consume/clear them in __get_page_type()? Right now I can't see
>>>>>> it being necessary for correctness to fiddle with any of the other
>>>>>> flushes using the domain dirty mask.
>>>>>>
>>>>>> But then again this may not be much of a win, unless the put
>>>>>> operations come through in meaningful batches, not interleaved by
>>>>>> any type changes (the latter ought to be guaranteed during domain
>>>>>> construction and teardown at least, as the guest itself can't do
>>>>>> anything at that time to effect type changes).
>>>>> I'm not sure how much batching there needs to be.  I agree that the
>>>>> domain creation case should work well though.  Let me think about
>>>>> the scenarios when dom B is live:
>>>>>
>>>>> 1. Dom A drops its foreign map of page X; dom B immediately changes
>>>>> the type of page X.  This case isn't helped at all, but I don't see
>>>>> any way to improve it -- dom A's TLBs need to be flushed right away.
>>>>>
>>>>> 2. Dom A drops its foreign map of page X; dom B immediately changes
>>>>> the type of page Y.  Now dom A's dirty CPUs are in the new map, but
>>>>> B may not need to flush them right away.  B can filter by page Y's
>>>>> timestamp, and flush (and clear) only some of the cpus in the map.
>>>>>
>>>>> So that seems good, but then there's a risk that cpus never get
>>>>> cleared from the map, and __get_page_type() ends up doing a lot of
>>>>> unnecessary work filtering timestaps.  When is it safe to remove a
>>>>> CPU from that map?
>>>>>  - obvs safe if we IPI it to flush the TLB (though may need memory
>>>>>    barriers -- need to think about a race with CPU C putting A _into_
>>>>>    the map at the same time...)
>>>>>  - we could track the timestamp of the most recent addition to the
>>>>>    map, and drop any CPU whose TLB has been flushed since that,
>>>>>    but that still lets unrelated unmaps keep CPUs alive in the map...
>>>>>  - we could double-buffer the map: always add CPUs to the active map;
>>>>>    from time to time, swap maps and flush everything in the non-active
>>>>>    map (filtered by the TLB timestamp when we last swapped over).
>>>>>
>>>>> Bah, this is turning into a tar pit.  Let's stick to the v2 patch as
>>>>> being (relatively) simple and correct, and revisit this if it causes
>>>>> trouble. :)
>>>> :(
>>>>
>>>> A 70% performance hit for guest creation is certainly going to cause
>>>> problems, but we obviously need to prioritise correctness in this case.
>>> Hmm, you did understand that the 70% hit is on a specific sub-part of
>>> the overall process, not guest creation as a whole? Anyway, your reply
>>> is neither an ack nor a nak nor an indication of what needs to change
>>> ...
>> Yes - I realise it isn't all of domain creation, but this performance hit will also
>> hit migration, qemu DMA mappings, etc.
>>
>> XenServer has started a side-by-side performance work-up of this change, as
>> presented at the root of this thread.  We should hopefully have some
>> number in the next day or two.
>>
> I did some measurements on two builds of a recent version of XenServer using Xen upstream 4.9.0-3.0.

The upstream base of this XenServer build was c/s ba39e9b, one change
past 4.9.0-rc3

>  The only difference between the builds was the patch x86-put-l1e-foreign-flush.patch  in https://lists.xenproject.org/archives/html/xen-devel/2017-04/msg02945.html.
>
> I observed no measurable difference between these builds with a guest RAM value of 4G, 8G and 14G for the following operations:
> - time xe vm-start
> - time xe vm-shutdown
> - vm downtime during "xe vm-migration" (as measured by pinging the vm during migration and verifying for how long pings would fail when both domains are paused)
> - time xe vm-migrate # for HVM guests (eg. win7 and win10)
>
> But I observed a difference for the duration of "time xe vm-migrate" for PV guests (eg. centos68, debian70, ubuntu1204). For centos68, for instance, I obtained the following values on a machine with a Intel E3-1281v3 3.7Ghz CPU, averaged over 10 runs for each data point:
> |   Guest RAM   |  no patch  | with patch | difference |  diff/RAM | 
> |   14GB        |   10.44s   |   13.46s   |    3.02s   |    0.22s/GB    |
> |    8GB        |    6.46s   |    8.28s   |    1.82s   |    0.23s/GB    |
> |    4GB        |    3.85s   |    4.74s   |    0.89s   |    0.22s/GB    |
>
> From these numbers, if the patch is present, it looks like VM migration of a PV guest would take an extra 1s for each extra 5GB of guest RAM. The VMs are mostly idle during migration. At this point, it's not clear to me why this difference is only visible on VM migration (as opposed to VM start for example), and only on a PV guest (as opposed to an HVM).

The difference between start and migrate can be explained.

During domain creation, we only have to foreign map the areas of the
guest we need to write into (guest kernel/initrd, or hvmloader/acpi
tables), which is independent of the quantity of RAM the guest has.

During migration, we must foreign map all guest RAM.  Furthermore, we
unmap and potentially remap again later if the RAM gets dirtied.  (A
64bit toolstack process could potentially foreign map the entire guest
and reuse the mappings.  A 32bit toolstack process very definitely
can't, so the migration logic uses the simpler approach of not reusing
mappings at all.)

I am at a loss to explain why the overhead is only observed when
migrating PV guests.  I would expect migrating HVM guests to have
identical properties in this regard.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-05-05 18:29 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-25  8:59 [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference Jan Beulich
2017-04-25 10:59 ` Tim Deegan
2017-04-25 11:59   ` Jan Beulich
2017-04-26  8:44     ` Tim Deegan
2017-04-26  9:01       ` Jan Beulich
2017-04-26 14:07   ` Jan Beulich
2017-04-26 14:25     ` Tim Deegan
2017-04-27  9:23       ` Jan Beulich
2017-04-27  9:51         ` Tim Deegan
2017-04-28 10:52           ` Jan Beulich
2017-05-02  8:32             ` Tim Deegan
2017-05-02  8:50               ` Jan Beulich
2017-05-02  9:43                 ` Tim Deegan
2017-05-02 17:37                   ` Andrew Cooper
2017-05-03  7:21                     ` Jan Beulich
2017-05-03  9:55                       ` Andrew Cooper
2017-05-05 18:16                         ` Marcus Granado
2017-05-05 18:29                           ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.