All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xen-devel] [PATCH 0/4] x86: Remove force-invalidate loop from relinqusish_memory
@ 2019-12-23 16:43 George Dunlap
  2019-12-23 16:43 ` [Xen-devel] [PATCH 1/4] xen: Remove trailing whitespace from time.c George Dunlap
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: George Dunlap @ 2019-12-23 16:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, George Dunlap, Jan Beulich

George Dunlap (4):
  xen: Remove trailing whitespace from time.c
  xen: Add 'synthetic' preemption check parameter
  mm: Use put_old_guest_table for relinquish_pages
  x86/mm: Remove force-invalidate loop

 docs/misc/xen-command-line.pandoc |  20 ++++-
 xen/arch/x86/domain.c             | 121 ++++++------------------------
 xen/arch/x86/time.c               |  41 ++++++----
 xen/include/xen/sched.h           |  10 ++-
 4 files changed, 75 insertions(+), 117 deletions(-)

CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>

--
2.24.0

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Xen-devel] [PATCH 1/4] xen: Remove trailing whitespace from time.c
  2019-12-23 16:43 [Xen-devel] [PATCH 0/4] x86: Remove force-invalidate loop from relinqusish_memory George Dunlap
@ 2019-12-23 16:43 ` George Dunlap
  2019-12-27  8:02   ` Jan Beulich
  2019-12-23 16:43 ` [Xen-devel] [PATCH 2/4] xen: Add 'synthetic' preemption check parameter George Dunlap
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: George Dunlap @ 2019-12-23 16:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, George Dunlap, Jan Beulich

No functional changes.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/time.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index ea696a95e8..64e471a39b 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1,10 +1,10 @@
 /******************************************************************************
  * arch/x86/time.c
- * 
+ *
  * Per-CPU time calibration and management.
- * 
+ *
  * Copyright (c) 2002-2005, K A Fraser
- * 
+ *
  * Portions from Linux are:
  * Copyright (c) 1991, 1992, 1995  Linus Torvalds
  */
@@ -78,8 +78,8 @@ static struct timer calibration_timer;
  * We simulate a 32-bit platform timer from the 16-bit PIT ch2 counter.
  * Otherwise overflow happens too quickly (~50ms) for us to guarantee that
  * softirq handling will happen in time.
- * 
- * The pit_lock protects the 16- and 32-bit stamp fields as well as the 
+ *
+ * The pit_lock protects the 16- and 32-bit stamp fields as well as the
  */
 static DEFINE_SPINLOCK(pit_lock);
 static u16 pit_stamp16;
@@ -100,7 +100,7 @@ static inline u32 div_frac(u32 dividend, u32 divisor)
 {
     u32 quotient, remainder;
     ASSERT(dividend < divisor);
-    asm ( 
+    asm (
         "divl %4"
         : "=a" (quotient), "=d" (remainder)
         : "0" (0), "1" (dividend), "r" (divisor) );
@@ -1011,7 +1011,7 @@ static void __get_cmos_time(struct rtc_time *rtc)
     rtc->day  = CMOS_READ(RTC_DAY_OF_MONTH);
     rtc->mon  = CMOS_READ(RTC_MONTH);
     rtc->year = CMOS_READ(RTC_YEAR);
-    
+
     if ( RTC_ALWAYS_BCD || !(CMOS_READ(RTC_CONTROL) & RTC_DM_BINARY) )
     {
         BCD_TO_BIN(rtc->sec);
@@ -1511,8 +1511,8 @@ static void check_tsc_warp(unsigned long tsc_khz, unsigned long *max_warp)
         spin_unlock(&sync_lock);
 
         /*
-         * Be nice every now and then (and also check whether measurement is 
-         * done [we also insert a 10 million loops safety exit, so we dont 
+         * Be nice every now and then (and also check whether measurement is
+         * done [we also insert a 10 million loops safety exit, so we dont
          * lock up in case the TSC readout is totally broken]):
          */
         if ( unlikely(!(i & 7)) )
@@ -1524,7 +1524,7 @@ static void check_tsc_warp(unsigned long tsc_khz, unsigned long *max_warp)
         }
 
         /*
-         * Outside the critical section we can now see whether we saw a 
+         * Outside the critical section we can now see whether we saw a
          * time-warp of the TSC going backwards:
          */
         if ( unlikely(prev > now) )
@@ -1806,11 +1806,11 @@ void init_percpu_time(void)
 }
 
 /*
- * On certain older Intel CPUs writing the TSC MSR clears the upper 32 bits. 
+ * On certain older Intel CPUs writing the TSC MSR clears the upper 32 bits.
  * Obviously we must not use write_tsc() on such CPUs.
  *
- * Additionally, AMD specifies that being able to write the TSC MSR is not an 
- * architectural feature (but, other than their manual says, also cannot be 
+ * Additionally, AMD specifies that being able to write the TSC MSR is not an
+ * architectural feature (but, other than their manual says, also cannot be
  * determined from CPUID bits).
  */
 static void __init tsc_check_writability(void)
@@ -2010,7 +2010,7 @@ void __init early_time_init(void)
 
     do_div(tmp, 1000);
     cpu_khz = (unsigned long)tmp;
-    printk("Detected %lu.%03lu MHz processor.\n", 
+    printk("Detected %lu.%03lu MHz processor.\n",
            cpu_khz / 1000, cpu_khz % 1000);
 
     setup_irq(0, 0, &irq0);
@@ -2025,7 +2025,7 @@ static int _disable_pit_irq(void(*hpet_broadcast_setup)(void))
         return -1;
 
     /*
-     * If we do not rely on PIT CH0 then we can use HPET for one-shot timer 
+     * If we do not rely on PIT CH0 then we can use HPET for one-shot timer
      * emulation when entering deep C states.
      * XXX dom0 may rely on RTC interrupt delivery, so only enable
      * hpet_broadcast if FSB mode available or if force_hpet_broadcast.
-- 
2.24.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Xen-devel] [PATCH 2/4] xen: Add 'synthetic' preemption check parameter
  2019-12-23 16:43 [Xen-devel] [PATCH 0/4] x86: Remove force-invalidate loop from relinqusish_memory George Dunlap
  2019-12-23 16:43 ` [Xen-devel] [PATCH 1/4] xen: Remove trailing whitespace from time.c George Dunlap
@ 2019-12-23 16:43 ` George Dunlap
  2019-12-27 13:57   ` Andrew Cooper
                     ` (2 more replies)
  2019-12-23 16:43 ` [Xen-devel] [PATCH 3/4] mm: Use put_old_guest_table for relinquish_pages George Dunlap
  2019-12-23 16:43 ` [Xen-devel] [PATCH 4/4] x86/mm: Remove force-invalidate loop George Dunlap
  3 siblings, 3 replies; 11+ messages in thread
From: George Dunlap @ 2019-12-23 16:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, George Dunlap, Jan Beulich

In order to better test hypervisor preemption paths, add an option to
artificially increase the number of preemptions.

While modifying xen-command-line.pandoc, escape some underscores, and
remove some trailing whitespace.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 docs/misc/xen-command-line.pandoc | 20 ++++++++++++++++++--
 xen/arch/x86/time.c               | 11 +++++++++++
 xen/include/xen/sched.h           | 10 +++++++++-
 3 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 981a5e2381..1a9fda8627 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -636,13 +636,29 @@ Available alternatives, with their meaning, are:
 Specify the USB controller to use, either by instance number (when going
 over the PCI busses sequentially) or by PCI device (must be on segment 0).
 
-### debug_stack_lines
+### debug\_stack\_lines
 > `= <integer>`
 
 > Default: `20`
 
 Limits the number lines printed in Xen stack traces.
 
+### debug-synthetic-preemption
+> `= <integer>`
+
+> Default: `0`
+
+Artificially increases rate at which `hypercall_preempt_check()`
+returns `true`, for debugging purposes, to a rate of one in `N`. (The
+default, `0`, disables the feature.)
+
+When promoting pagetables, for instance, `hypercall_preempt_check()`
+is called before processing each PTE.  Since there are 512 PTEs per
+page, a value of `1024` should result in pagetable promotion being
+interrupted every other page on average.
+
+Only available in DEBUG builds.
+
 ### debugtrace
 > `= [cpu:]<size>`
 
@@ -1690,7 +1706,7 @@ The following resources are available:
     CDP, one COS will corespond two CBMs other than one with CAT, due to the
     sum of CBMs is fixed, that means actual `cos_max` in use will automatically
     reduce to half when CDP is enabled.
-	
+
 ### pv-linear-pt (x86)
 > `= <boolean>`
 
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 64e471a39b..34302f81e7 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -43,6 +43,17 @@
 static char __initdata opt_clocksource[10];
 string_param("clocksource", opt_clocksource);
 
+#ifndef NDEBUG
+int debug_synthetic_preemption = 0;
+integer_param("debug-synthetic-preemption", debug_synthetic_preemption);
+
+bool synthetic_preemption_check(void) {
+    if ( debug_synthetic_preemption == 0 )
+        return false;
+    return !(rdtsc() % debug_synthetic_preemption);
+}
+#endif
+
 unsigned long __read_mostly cpu_khz;  /* CPU clock frequency in kHz. */
 DEFINE_SPINLOCK(rtc_lock);
 unsigned long pit0_ticks;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 9f7bc69293..c0071eee04 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -748,6 +748,13 @@ static inline void hypercall_cancel_continuation(struct vcpu *v)
     v->hcall_preempted = false;
 }
 
+#ifndef NDEBUG
+bool synthetic_preemption_check(void);
+#define synthetic_preemption_check synthetic_preemption_check
+#else
+#define synthetic_preempiton_check() false
+#endif
+
 /*
  * For long-running operations that must be in hypercall context, check
  * if there is background work to be done that should interrupt this
@@ -755,7 +762,8 @@ static inline void hypercall_cancel_continuation(struct vcpu *v)
  */
 #define hypercall_preempt_check() (unlikely(    \
         softirq_pending(smp_processor_id()) |   \
-        local_events_need_delivery()            \
+        local_events_need_delivery() |          \
+        synthetic_preemption_check()            \
     ))
 
 /*
-- 
2.24.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Xen-devel] [PATCH 3/4] mm: Use put_old_guest_table for relinquish_pages
  2019-12-23 16:43 [Xen-devel] [PATCH 0/4] x86: Remove force-invalidate loop from relinqusish_memory George Dunlap
  2019-12-23 16:43 ` [Xen-devel] [PATCH 1/4] xen: Remove trailing whitespace from time.c George Dunlap
  2019-12-23 16:43 ` [Xen-devel] [PATCH 2/4] xen: Add 'synthetic' preemption check parameter George Dunlap
@ 2019-12-23 16:43 ` George Dunlap
  2020-01-03 15:43   ` Jan Beulich
  2019-12-23 16:43 ` [Xen-devel] [PATCH 4/4] x86/mm: Remove force-invalidate loop George Dunlap
  3 siblings, 1 reply; 11+ messages in thread
From: George Dunlap @ 2019-12-23 16:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, George Dunlap, Jan Beulich

relinquish_pages() deals with interrupted de-validation in a fairly
ad-hoc way, by either re-setting PGT_pinned (in the case of EINTR) or
letting the page "fall through" to the "force invalidate" loop below.
This requires an extensive comment describing what needs to happen to
the type and count in each case, and why each works.  Additionally, it
turns out that at this point, the "force invalidate" loop is only
required to handle this ad-hoc continuation.

Replace this with the 'standard' way of dealing with restarting pages,
old_guest_table.  Call put_old_guest_table(current) at the top of the
function, and set current->arch.old_guest_table* as appropriate.  This
code is simpler, and mirrors other old_guest_table code in mm.c.  It
will also allow us to remove the force-invalidate loop entirely in a
subsequent patch.

While here, make the refcounting logic a bit easier to follow: We
always drop the general reference held by PGT_pinned, regardless of
what happens to the type count.  Rather than manually re-dropping the
refcount if put_page_and_type_preemptible() fails, just drop the
refcount unconditionally, and call put_page_type_preemptible()
instead.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/domain.c | 50 +++++++++++++++++++------------------------
 1 file changed, 22 insertions(+), 28 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index d9c63379cd..b7968463cb 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1953,6 +1953,10 @@ static int relinquish_memory(
     unsigned long     x, y;
     int               ret = 0;
 
+    ret = put_old_guest_table(current);
+    if ( ret )
+        return ret;
+
     /* Use a recursive lock, as we may enter 'free_domheap_page'. */
     spin_lock_recursive(&d->page_alloc_lock);
 
@@ -1967,42 +1971,32 @@ static int relinquish_memory(
         }
 
         if ( test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) )
-            ret = put_page_and_type_preemptible(page);
+        {
+            /* Always drop the page ref associated with PGT_pinned */
+            put_page(page);
+            ret = put_page_type_preemptible(page);
+        }
         switch ( ret )
         {
         case 0:
             break;
-        case -ERESTART:
         case -EINTR:
-            /*
-             * -EINTR means PGT_validated has been re-set; re-set
-             * PGT_pinned again so that it gets picked up next time
-             * around.
-             *
-             * -ERESTART, OTOH, means PGT_partial is set instead.  Put
-             * it back on the list, but don't set PGT_pinned; the
-             * section below will finish off de-validation.  But we do
-             * need to drop the general ref associated with
-             * PGT_pinned, since put_page_and_type_preemptible()
-             * didn't do it.
-             *
-             * NB we can do an ASSERT for PGT_validated, since we
-             * "own" the type ref; but theoretically, the PGT_partial
-             * could be cleared by someone else.
-             */
-            if ( ret == -EINTR )
-            {
-                ASSERT(page->u.inuse.type_info & PGT_validated);
-                set_bit(_PGT_pinned, &page->u.inuse.type_info);
-            }
-            else
-                put_page(page);
+            ASSERT(page->u.inuse.type_info & PGT_validated);
+            /* Fallthrough */
+        case -ERESTART:
+            current->arch.old_guest_ptpg = NULL;
+            current->arch.old_guest_table = page;
+            current->arch.old_guest_table_partial = (ret == -ERESTART);
 
             ret = -ERESTART;
 
-            /* Put the page back on the list and drop the ref we grabbed above */
-            page_list_add(page, list);
-            put_page(page);
+            /* Make sure we don't lose track of the page */
+            page_list_add_tail(page, &d->arch.relmem_list);
+
+            /*
+             * NB that we've transferred the general ref acquired at
+             * the top of the loop to old_guest_table.
+             */
             goto out;
         default:
             BUG();
-- 
2.24.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Xen-devel] [PATCH 4/4] x86/mm: Remove force-invalidate loop
  2019-12-23 16:43 [Xen-devel] [PATCH 0/4] x86: Remove force-invalidate loop from relinqusish_memory George Dunlap
                   ` (2 preceding siblings ...)
  2019-12-23 16:43 ` [Xen-devel] [PATCH 3/4] mm: Use put_old_guest_table for relinquish_pages George Dunlap
@ 2019-12-23 16:43 ` George Dunlap
  2020-01-03 15:51   ` Jan Beulich
  3 siblings, 1 reply; 11+ messages in thread
From: George Dunlap @ 2019-12-23 16:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, George Dunlap, Jan Beulich

The comment about the "force-invalidate" loop gives two reasons for
its existence:

1. Breaking circular "linear pagetable" references

2. Cleaning up partially-validated pages.

The first reason been invalid since XSA-240: Since then it hasn't been
possible to generate circular linear pagetable references.

The second reason has been invalid since long before that: Before
calling relinquish_memory(), domain_relinquish_resources() calls
vcpu_destroy_pagetables() on each vcpu; this will in turn call
put_old_guest_table() on each vcpu.  The result should be that it's
not possible to have top-level partially-devalidated pagetables by the
time we get to relinquish_memory().

The loop, it turns out, was effectively there only to pick up
interrupted UNPIN operations of relinquish_memory() itself.  Now that
these are being handled by put_old_guest_table(), we can remove this
loop entirely.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
This patch has the advantage that it doesn't hide mis-accounted
partial pages anymore; but has the disadvantage that if there *is*
such a mis-accounting, it will become a resource leak (and thus an
XSA).  It might be a good idea to add another "clean up partial pages"
pass after all other pages have been cleaned up, with a suitable
ASSERT().  That way we have a higher chance of catching mis-accounting
during development, without risking opening up a memory / domain leak
in production.

CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/domain.c | 71 -------------------------------------------
 1 file changed, 71 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index b7968463cb..847a70302c 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1950,7 +1950,6 @@ static int relinquish_memory(
     struct domain *d, struct page_list_head *list, unsigned long type)
 {
     struct page_info  *page;
-    unsigned long     x, y;
     int               ret = 0;
 
     ret = put_old_guest_table(current);
@@ -2004,76 +2003,6 @@ static int relinquish_memory(
 
         put_page_alloc_ref(page);
 
-        /*
-         * Forcibly invalidate top-most, still valid page tables at this point
-         * to break circular 'linear page table' references as well as clean up
-         * partially validated pages. This is okay because MMU structures are
-         * not shared across domains and this domain is now dead. Thus top-most
-         * valid tables are not in use so a non-zero count means circular
-         * reference or partially validated.
-         */
-        y = page->u.inuse.type_info;
-        for ( ; ; )
-        {
-            x = y;
-            if ( likely((x & PGT_type_mask) != type) ||
-                 likely(!(x & (PGT_validated|PGT_partial))) )
-                break;
-
-            y = cmpxchg(&page->u.inuse.type_info, x,
-                        x & ~(PGT_validated|PGT_partial));
-            if ( likely(y == x) )
-            {
-                /* No need for atomic update of type_info here: noone else updates it. */
-                switch ( ret = devalidate_page(page, x, 1) )
-                {
-                case 0:
-                    break;
-                case -EINTR:
-                    page_list_add(page, list);
-                    page->u.inuse.type_info |= PGT_validated;
-                    if ( x & PGT_partial )
-                        put_page(page);
-                    put_page(page);
-                    ret = -ERESTART;
-                    goto out;
-                case -ERESTART:
-                    page_list_add(page, list);
-                    /*
-                     * PGT_partial holds a type ref and a general ref.
-                     * If we came in with PGT_partial set, then we 1)
-                     * don't need to grab an extra type count, and 2)
-                     * do need to drop the extra page ref we grabbed
-                     * at the top of the loop.  If we didn't come in
-                     * with PGT_partial set, we 1) do need to drab an
-                     * extra type count, but 2) can transfer the page
-                     * ref we grabbed above to it.
-                     *
-                     * Note that we must increment type_info before
-                     * setting PGT_partial.  Theoretically it should
-                     * be safe to drop the page ref before setting
-                     * PGT_partial, but do it afterwards just to be
-                     * extra safe.
-                     */
-                    if ( !(x & PGT_partial) )
-                        page->u.inuse.type_info++;
-                    smp_wmb();
-                    page->u.inuse.type_info |= PGT_partial;
-                    if ( x & PGT_partial )
-                        put_page(page);
-                    goto out;
-                default:
-                    BUG();
-                }
-                if ( x & PGT_partial )
-                {
-                    page->u.inuse.type_info--;
-                    put_page(page);
-                }
-                break;
-            }
-        }
-
         /* Put the page on the list and /then/ potentially free it. */
         page_list_add_tail(page, &d->arch.relmem_list);
         put_page(page);
-- 
2.24.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [PATCH 1/4] xen: Remove trailing whitespace from time.c
  2019-12-23 16:43 ` [Xen-devel] [PATCH 1/4] xen: Remove trailing whitespace from time.c George Dunlap
@ 2019-12-27  8:02   ` Jan Beulich
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2019-12-27  8:02 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Andrew Cooper

(re-sending, as I still don't see the mail having appeared on the list)

On 23.12.2019 17:43, George Dunlap wrote:
> No functional changes.
> 
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] xen: Add 'synthetic' preemption check parameter
  2019-12-23 16:43 ` [Xen-devel] [PATCH 2/4] xen: Add 'synthetic' preemption check parameter George Dunlap
@ 2019-12-27 13:57   ` Andrew Cooper
  2019-12-27 15:11   ` Julien Grall
  2020-01-03 12:24   ` Jan Beulich
  2 siblings, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2019-12-27 13:57 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: Jan Beulich

On 23/12/2019 16:43, George Dunlap wrote:
> In order to better test hypervisor preemption paths, add an option to
> artificially increase the number of preemptions.
>
> While modifying xen-command-line.pandoc, escape some underscores

This is pandoc, not markdown, and underscores like these are one of the
differences.  I specifically took these underscores out so the HTML
anchor links work correctly.

> , and
> remove some trailing whitespace.
>
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> ---
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Jan Beulich <jbeulich@suse.com>
> ---
>  docs/misc/xen-command-line.pandoc | 20 ++++++++++++++++++--
>  xen/arch/x86/time.c               | 11 +++++++++++
>  xen/include/xen/sched.h           | 10 +++++++++-
>  3 files changed, 38 insertions(+), 3 deletions(-)
>
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index 981a5e2381..1a9fda8627 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -636,13 +636,29 @@ Available alternatives, with their meaning, are:
>  Specify the USB controller to use, either by instance number (when going
>  over the PCI busses sequentially) or by PCI device (must be on segment 0).
>  
> -### debug_stack_lines
> +### debug\_stack\_lines
>  > `= <integer>`
>  
>  > Default: `20`
>  
>  Limits the number lines printed in Xen stack traces.
>  
> +### debug-synthetic-preemption
> +> `= <integer>`
> +
> +> Default: `0`
> +
> +Artificially increases rate at which `hypercall_preempt_check()`
> +returns `true`, for debugging purposes, to a rate of one in `N`. (The
> +default, `0`, disables the feature.)
> +
> +When promoting pagetables, for instance, `hypercall_preempt_check()`
> +is called before processing each PTE.  Since there are 512 PTEs per
> +page, a value of `1024` should result in pagetable promotion being
> +interrupted every other page on average.
> +
> +Only available in DEBUG builds.

Please, not like this.  I learnt the hard way with hvm_fep that it is
important to have functionality like this in release builds as well.

Give it a Kconfig option, and default it to DEBUG.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] xen: Add 'synthetic' preemption check parameter
  2019-12-23 16:43 ` [Xen-devel] [PATCH 2/4] xen: Add 'synthetic' preemption check parameter George Dunlap
  2019-12-27 13:57   ` Andrew Cooper
@ 2019-12-27 15:11   ` Julien Grall
  2020-01-03 12:24   ` Jan Beulich
  2 siblings, 0 replies; 11+ messages in thread
From: Julien Grall @ 2019-12-27 15:11 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Jan Beulich, Andrew Cooper


[-- Attachment #1.1: Type: text/plain, Size: 4579 bytes --]

Hi George,

I was expecting a bigger list of CC here. What did you use to compute it?

On Mon, 23 Dec 2019, 17:45 George Dunlap, <george.dunlap@citrix.com> wrote:

> In order to better test hypervisor preemption paths, add an option to
> artificially increase the number of preemptions.
>
> While modifying xen-command-line.pandoc, escape some underscores, and
> remove some trailing whitespace.
>
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> ---
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Jan Beulich <jbeulich@suse.com>
> ---
>  docs/misc/xen-command-line.pandoc | 20 ++++++++++++++++++--
>  xen/arch/x86/time.c               | 11 +++++++++++
>  xen/include/xen/sched.h           | 10 +++++++++-
>  3 files changed, 38 insertions(+), 3 deletions(-)
>
> diff --git a/docs/misc/xen-command-line.pandoc
> b/docs/misc/xen-command-line.pandoc
> index 981a5e2381..1a9fda8627 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -636,13 +636,29 @@ Available alternatives, with their meaning, are:
>  Specify the USB controller to use, either by instance number (when going
>  over the PCI busses sequentially) or by PCI device (must be on segment 0).
>
> -### debug_stack_lines
> +### debug\_stack\_lines
>  > `= <integer>`
>
>  > Default: `20`
>
>  Limits the number lines printed in Xen stack traces.
>
> +### debug-synthetic-preemption
> +> `= <integer>`
> +
> +> Default: `0`
> +
> +Artificially increases rate at which `hypercall_preempt_check()`
> +returns `true`, for debugging purposes, to a rate of one in `N`. (The
> +default, `0`, disables the feature.)
> +
> +When promoting pagetables, for instance, `hypercall_preempt_check()`
> +is called before processing each PTE.  Since there are 512 PTEs per
> +page, a value of `1024` should result in pagetable promotion being
> +interrupted every other page on average.
> +
> +Only available in DEBUG builds.
> +
>  ### debugtrace
>  > `= [cpu:]<size>`
>
> @@ -1690,7 +1706,7 @@ The following resources are available:
>      CDP, one COS will corespond two CBMs other than one with CAT, due to
> the
>      sum of CBMs is fixed, that means actual `cos_max` in use will
> automatically
>      reduce to half when CDP is enabled.
> -
> +
>  ### pv-linear-pt (x86)
>  > `= <boolean>`
>
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index 64e471a39b..34302f81e7 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -43,6 +43,17 @@
>  static char __initdata opt_clocksource[10];
>  string_param("clocksource", opt_clocksource);
>
> +#ifndef NDEBUG
> +int debug_synthetic_preemption = 0;
> +integer_param("debug-synthetic-preemption", debug_synthetic_preemption);
> +
> +bool synthetic_preemption_check(void) {
> +    if ( debug_synthetic_preemption == 0 )
> +        return false;
> +    return !(rdtsc() % debug_synthetic_preemption);
> +}
> +#endif

+
>  unsigned long __read_mostly cpu_khz;  /* CPU clock frequency in kHz. */
>  DEFINE_SPINLOCK(rtc_lock);
>  unsigned long pit0_ticks;
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index 9f7bc69293..c0071eee04 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -748,6 +748,13 @@ static inline void
> hypercall_cancel_continuation(struct vcpu *v)
>      v->hcall_preempted = false;
>  }
>
> +#ifndef NDEBUG
> +bool synthetic_preemption_check(void);
> +#define synthetic_preemption_check synthetic_preemption_check


Why do you need this define?

+#else
> +#define synthetic_preempiton_check() false
>

Typo in the name. Also, it seems like this wasn't tested on Arm and, AFAICT
break because the function would not be definr in debug build.

But, I am not sure why the implementation needs to be arch specific when
get_cycles() could do the job.

+#endif
> +
>  /*
>   * For long-running operations that must be in hypercall context, check
>   * if there is background work to be done that should interrupt this
> @@ -755,7 +762,8 @@ static inline void
> hypercall_cancel_continuation(struct vcpu *v)
>   */
>  #define hypercall_preempt_check() (unlikely(    \
>          softirq_pending(smp_processor_id()) |   \
> -        local_events_need_delivery()            \
> +        local_events_need_delivery() |          \
> +        synthetic_preemption_check()            \


The function you return bool, so shouldn't it be ||?

Cheers,

     ))
>
>  /*
> --
> 2.24.0
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

[-- Attachment #1.2: Type: text/html, Size: 7026 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] xen: Add 'synthetic' preemption check parameter
  2019-12-23 16:43 ` [Xen-devel] [PATCH 2/4] xen: Add 'synthetic' preemption check parameter George Dunlap
  2019-12-27 13:57   ` Andrew Cooper
  2019-12-27 15:11   ` Julien Grall
@ 2020-01-03 12:24   ` Jan Beulich
  2 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2020-01-03 12:24 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Andrew Cooper

On 23.12.2019 17:43, George Dunlap wrote:
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -636,13 +636,29 @@ Available alternatives, with their meaning, are:
>  Specify the USB controller to use, either by instance number (when going
>  over the PCI busses sequentially) or by PCI device (must be on segment 0).
>  
> -### debug_stack_lines
> +### debug\_stack\_lines
>  > `= <integer>`
>  
>  > Default: `20`
>  
>  Limits the number lines printed in Xen stack traces.
>  
> +### debug-synthetic-preemption
> +> `= <integer>`
> +
> +> Default: `0`
> +
> +Artificially increases rate at which `hypercall_preempt_check()`
> +returns `true`, for debugging purposes, to a rate of one in `N`. (The
> +default, `0`, disables the feature.)
> +
> +When promoting pagetables, for instance, `hypercall_preempt_check()`
> +is called before processing each PTE.  Since there are 512 PTEs per
> +page, a value of `1024` should result in pagetable promotion being
> +interrupted every other page on average.

If this is meant to be x86 only, then text here should state this.
Otherwise I think it would help if the example described in the
last sentence would mention its x86 relationship.

> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -43,6 +43,17 @@
>  static char __initdata opt_clocksource[10];
>  string_param("clocksource", opt_clocksource);
>  
> +#ifndef NDEBUG
> +int debug_synthetic_preemption = 0;

static unsigned int __read_mostly?

> +integer_param("debug-synthetic-preemption", debug_synthetic_preemption);

Perhaps allow changing at runtime?

> +bool synthetic_preemption_check(void) {

Brace placement.

> +    if ( debug_synthetic_preemption == 0 )
> +        return false;
> +    return !(rdtsc() % debug_synthetic_preemption);

Please consistently use either ! or "== 0".

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [PATCH 3/4] mm: Use put_old_guest_table for relinquish_pages
  2019-12-23 16:43 ` [Xen-devel] [PATCH 3/4] mm: Use put_old_guest_table for relinquish_pages George Dunlap
@ 2020-01-03 15:43   ` Jan Beulich
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2020-01-03 15:43 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Andrew Cooper

On 23.12.2019 17:43, George Dunlap wrote:
> @@ -1967,42 +1971,32 @@ static int relinquish_memory(
>          }
>  
>          if ( test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) )
> -            ret = put_page_and_type_preemptible(page);
> +        {
> +            /* Always drop the page ref associated with PGT_pinned */
> +            put_page(page);
> +            ret = put_page_type_preemptible(page);
> +        }
>          switch ( ret )
>          {
>          case 0:
>              break;
> -        case -ERESTART:
>          case -EINTR:
> -            /*
> -             * -EINTR means PGT_validated has been re-set; re-set
> -             * PGT_pinned again so that it gets picked up next time
> -             * around.
> -             *
> -             * -ERESTART, OTOH, means PGT_partial is set instead.  Put
> -             * it back on the list, but don't set PGT_pinned; the
> -             * section below will finish off de-validation.  But we do
> -             * need to drop the general ref associated with
> -             * PGT_pinned, since put_page_and_type_preemptible()
> -             * didn't do it.
> -             *
> -             * NB we can do an ASSERT for PGT_validated, since we
> -             * "own" the type ref; but theoretically, the PGT_partial
> -             * could be cleared by someone else.
> -             */
> -            if ( ret == -EINTR )
> -            {
> -                ASSERT(page->u.inuse.type_info & PGT_validated);
> -                set_bit(_PGT_pinned, &page->u.inuse.type_info);
> -            }
> -            else
> -                put_page(page);
> +            ASSERT(page->u.inuse.type_info & PGT_validated);
> +            /* Fallthrough */
> +        case -ERESTART:
> +            current->arch.old_guest_ptpg = NULL;
> +            current->arch.old_guest_table = page;
> +            current->arch.old_guest_table_partial = (ret == -ERESTART);
>  
>              ret = -ERESTART;
>  
> -            /* Put the page back on the list and drop the ref we grabbed above */
> -            page_list_add(page, list);
> -            put_page(page);
> +            /* Make sure we don't lose track of the page */
> +            page_list_add_tail(page, &d->arch.relmem_list);

Why at the tail? The prior page_list_add() made sure we'd encounter
this page first on the subsequent continuation. No need to keep
(perhaps very) many pages in partially destructed state. With this
changed back (or the tail insertion suitably explained in the
description)
Reviewed-by: Jan Beulich <jbeulich@suse.com>

On top of this please consider latching current into a local
variable.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Xen-devel] [PATCH 4/4] x86/mm: Remove force-invalidate loop
  2019-12-23 16:43 ` [Xen-devel] [PATCH 4/4] x86/mm: Remove force-invalidate loop George Dunlap
@ 2020-01-03 15:51   ` Jan Beulich
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2020-01-03 15:51 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, Andrew Cooper

On 23.12.2019 17:43, George Dunlap wrote:
> The comment about the "force-invalidate" loop gives two reasons for
> its existence:
> 
> 1. Breaking circular "linear pagetable" references
> 
> 2. Cleaning up partially-validated pages.
> 
> The first reason been invalid since XSA-240: Since then it hasn't been
> possible to generate circular linear pagetable references.
> 
> The second reason has been invalid since long before that: Before
> calling relinquish_memory(), domain_relinquish_resources() calls
> vcpu_destroy_pagetables() on each vcpu; this will in turn call
> put_old_guest_table() on each vcpu.  The result should be that it's
> not possible to have top-level partially-devalidated pagetables by the
> time we get to relinquish_memory().

There's a subtle difference: Up in the enumeration you correctly
say "partially-validated pages". Down here you also correctly state
that put_old_guest_table() deals with partially-devalidated
pagetables. The latter indeed shouldn't be able to make it here,
but what about ones that have been partially validated (and hence
not put into ->arch.old_guest_table) while the domain was still
alive? (I think the code change is still correct, because
partially validated pages ought to be taken care of correctly, but
I'd prefer if the description matched the change.)

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-01-03 15:50 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-23 16:43 [Xen-devel] [PATCH 0/4] x86: Remove force-invalidate loop from relinqusish_memory George Dunlap
2019-12-23 16:43 ` [Xen-devel] [PATCH 1/4] xen: Remove trailing whitespace from time.c George Dunlap
2019-12-27  8:02   ` Jan Beulich
2019-12-23 16:43 ` [Xen-devel] [PATCH 2/4] xen: Add 'synthetic' preemption check parameter George Dunlap
2019-12-27 13:57   ` Andrew Cooper
2019-12-27 15:11   ` Julien Grall
2020-01-03 12:24   ` Jan Beulich
2019-12-23 16:43 ` [Xen-devel] [PATCH 3/4] mm: Use put_old_guest_table for relinquish_pages George Dunlap
2020-01-03 15:43   ` Jan Beulich
2019-12-23 16:43 ` [Xen-devel] [PATCH 4/4] x86/mm: Remove force-invalidate loop George Dunlap
2020-01-03 15:51   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.