All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks
@ 2020-02-13 12:54 Juergen Gross
  2020-02-13 12:54 ` [Xen-devel] [PATCH 1/8] xen: make rangeset_printk() static Juergen Gross
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Juergen Gross @ 2020-02-13 12:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Kevin Tian, Stefano Stabellini, Julien Grall,
	Jun Nakajima, Wei Liu, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Ian Jackson, Dario Faggioli, Ross Lagerwall,
	Meng Xu, Jan Beulich, Roger Pau Monné

Keyhandlers dumping hypervisor information to the console often need
to take locks while accessing data. In order to not block in case of
system inconsistencies it is convenient to use trylock variants when
obtaining the locks. On the other hand a busy system might easily
encounter held locks, so this patch series is adding special trylock
variants with a timeout used by keyhandlers.

Only the keyhandlers dumping out data are modified to use those new
lock variants. There might be still some potentially blocking locks
involved (e.g. the console lock for printing), but most critical
cases should be handled by this series.

The timeout per lock is 1 millisecond as default, the value can be
changed by boot and runtime parameter setting.

Patch 1: cleanup (can go in independently of all other patches)
Patch 2: fix in keyhandlers where domlist_read_lock was not taken when
         using for_each_domain() (can go in independently)
Patch 3: cleanup of lock handling in scheduling code (prerequisite
         patch for patch 5)
Patch 4: main infrastructure for trylocks with timeout plus adaption
         of keyhandlers directly in keyhandler.c
Patches 5-8: adaptions of other keyhandlers (split roughly by path)

Juergen Gross (8):
  xen: make rangeset_printk() static
  xen: add using domlist_read_lock in keyhandlers
  xen/sched: don't use irqsave locks in dumping functions
  xen: add locks with timeouts for keyhandlers
  xen/sched: use keyhandler locks when dumping data to console
  xen/common: use keyhandler locks when dumping data to console
  xen/drivers: use keyhandler locks when dumping data to console
  xen/x86: use keyhandler locks when dumping data to console

 docs/misc/xen-command-line.pandoc        |  9 ++++++
 xen/arch/x86/domain.c                    |  9 ++++--
 xen/arch/x86/io_apic.c                   | 53 ++++++++++++++++++++++++--------
 xen/arch/x86/irq.c                       |  5 ++-
 xen/arch/x86/mm/p2m-ept.c                |  4 +++
 xen/arch/x86/msi.c                       |  4 ++-
 xen/arch/x86/numa.c                      | 16 ++++++----
 xen/arch/x86/time.c                      |  5 +++
 xen/common/event_channel.c               |  3 +-
 xen/common/grant_table.c                 | 39 +++++++++++++++++++++--
 xen/common/keyhandler.c                  | 29 ++++++++++++++++-
 xen/common/livepatch.c                   | 11 ++-----
 xen/common/rangeset.c                    | 10 +++---
 xen/common/sched/core.c                  |  7 +++++
 xen/common/sched/cpupool.c               |  4 ++-
 xen/common/sched/credit.c                | 31 ++++++++++++-------
 xen/common/sched/credit2.c               | 20 +++++++-----
 xen/common/sched/null.c                  | 50 +++++++++++++++++-------------
 xen/common/sched/private.h               |  1 +
 xen/common/sched/rt.c                    | 13 ++++----
 xen/common/spinlock.c                    | 18 +++++++++--
 xen/common/timer.c                       | 15 +++++----
 xen/drivers/passthrough/amd/iommu_intr.c | 14 ++++++---
 xen/drivers/passthrough/iommu.c          |  5 +++
 xen/drivers/passthrough/pci.c            | 14 +++++++--
 xen/drivers/vpci/msi.c                   |  5 ++-
 xen/include/xen/keyhandler.h             | 26 ++++++++++++++++
 xen/include/xen/rangeset.h               |  2 --
 xen/include/xen/rwlock.h                 | 37 ++++++++++++++++++++++
 29 files changed, 352 insertions(+), 107 deletions(-)

-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Xen-devel] [PATCH 1/8] xen: make rangeset_printk() static
  2020-02-13 12:54 [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks Juergen Gross
@ 2020-02-13 12:54 ` Juergen Gross
  2020-02-13 14:00   ` Jan Beulich
  2020-02-13 12:54 ` [Xen-devel] [PATCH 2/8] xen: add using domlist_read_lock in keyhandlers Juergen Gross
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Juergen Gross @ 2020-02-13 12:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Julien Grall, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Jan Beulich

rangeset_printk() is only used locally, so it can be made static.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/rangeset.c      | 3 +--
 xen/include/xen/rangeset.h | 2 --
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index f34cafdc7e..4ebba30ba3 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -541,8 +541,7 @@ static void print_limit(struct rangeset *r, unsigned long s)
     printk((r->flags & RANGESETF_prettyprint_hex) ? "%lx" : "%lu", s);
 }
 
-void rangeset_printk(
-    struct rangeset *r)
+static void rangeset_printk(struct rangeset *r)
 {
     int nr_printed = 0;
     struct range *x;
diff --git a/xen/include/xen/rangeset.h b/xen/include/xen/rangeset.h
index 0c05c2fd4e..5f62a97971 100644
--- a/xen/include/xen/rangeset.h
+++ b/xen/include/xen/rangeset.h
@@ -95,8 +95,6 @@ bool_t __must_check rangeset_contains_singleton(
 void rangeset_swap(struct rangeset *a, struct rangeset *b);
 
 /* Rangeset pretty printing. */
-void rangeset_printk(
-    struct rangeset *r);
 void rangeset_domain_printk(
     struct domain *d);
 
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Xen-devel] [PATCH 2/8] xen: add using domlist_read_lock in keyhandlers
  2020-02-13 12:54 [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks Juergen Gross
  2020-02-13 12:54 ` [Xen-devel] [PATCH 1/8] xen: make rangeset_printk() static Juergen Gross
@ 2020-02-13 12:54 ` Juergen Gross
  2020-02-13 14:01   ` Jan Beulich
                     ` (2 more replies)
  2020-02-13 12:54 ` [Xen-devel] [PATCH 3/8] xen/sched: don't use irqsave locks in dumping functions Juergen Gross
                   ` (6 subsequent siblings)
  8 siblings, 3 replies; 26+ messages in thread
From: Juergen Gross @ 2020-02-13 12:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Kevin Tian, Stefano Stabellini, Julien Grall,
	Jun Nakajima, Wei Liu, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Ian Jackson, Jan Beulich, Roger Pau Monné

Using for_each_domain() with out holding the domlist_read_lock is
fragile, so add the lock in the keyhandlers it is missing.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/mm/p2m-ept.c       | 4 ++++
 xen/arch/x86/time.c             | 5 +++++
 xen/common/grant_table.c        | 7 +++++++
 xen/drivers/passthrough/iommu.c | 5 +++++
 4 files changed, 21 insertions(+)

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index d4defa01c2..eb0f0edfef 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -1297,6 +1297,8 @@ static void ept_dump_p2m_table(unsigned char key)
     struct p2m_domain *p2m;
     struct ept_data *ept;
 
+    rcu_read_lock(&domlist_read_lock);
+
     for_each_domain(d)
     {
         if ( !hap_enabled(d) )
@@ -1347,6 +1349,8 @@ static void ept_dump_p2m_table(unsigned char key)
             unmap_domain_page(table);
         }
     }
+
+    rcu_read_unlock(&domlist_read_lock);
 }
 
 void setup_ept_dump(void)
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index cf3e51fb5e..509679235d 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2401,6 +2401,9 @@ static void dump_softtsc(unsigned char key)
     } else
         printk("TSC not marked as either constant or reliable, "
                "warp=%lu (count=%lu)\n", tsc_max_warp, tsc_check_count);
+
+    rcu_read_lock(&domlist_read_lock);
+
     for_each_domain ( d )
     {
         if ( is_hardware_domain(d) && d->arch.tsc_mode == TSC_MODE_DEFAULT )
@@ -2417,6 +2420,8 @@ static void dump_softtsc(unsigned char key)
         domcnt++;
     }
 
+    rcu_read_unlock(&domlist_read_lock);
+
     if ( !domcnt )
             printk("No domains have emulated TSC\n");
 }
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 2ecf38dfbe..c793927cd6 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -4104,9 +4104,16 @@ static void gnttab_usage_print(struct domain *rd)
 static void gnttab_usage_print_all(unsigned char key)
 {
     struct domain *d;
+
     printk("%s [ key '%c' pressed\n", __func__, key);
+
+    rcu_read_lock(&domlist_read_lock);
+
     for_each_domain ( d )
         gnttab_usage_print(d);
+
+    rcu_read_unlock(&domlist_read_lock);
+
     printk("%s ] done\n", __func__);
 }
 
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 9d421e06de..cab7a068aa 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -591,6 +591,9 @@ static void iommu_dump_p2m_table(unsigned char key)
     }
 
     ops = iommu_get_ops();
+
+    rcu_read_lock(&domlist_read_lock);
+
     for_each_domain(d)
     {
         if ( is_hardware_domain(d) || !is_iommu_enabled(d) )
@@ -605,6 +608,8 @@ static void iommu_dump_p2m_table(unsigned char key)
         printk("\ndomain%d IOMMU p2m table: \n", d->domain_id);
         ops->dump_p2m_table(d);
     }
+
+    rcu_read_unlock(&domlist_read_lock);
 }
 
 /*
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Xen-devel] [PATCH 3/8] xen/sched: don't use irqsave locks in dumping functions
  2020-02-13 12:54 [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks Juergen Gross
  2020-02-13 12:54 ` [Xen-devel] [PATCH 1/8] xen: make rangeset_printk() static Juergen Gross
  2020-02-13 12:54 ` [Xen-devel] [PATCH 2/8] xen: add using domlist_read_lock in keyhandlers Juergen Gross
@ 2020-02-13 12:54 ` Juergen Gross
  2020-02-19 12:40   ` Dario Faggioli
  2020-02-19 14:27   ` Jan Beulich
  2020-02-13 12:54 ` [Xen-devel] [PATCH 4/8] xen: add locks with timeouts for keyhandlers Juergen Gross
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 26+ messages in thread
From: Juergen Gross @ 2020-02-13 12:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Meng Xu, Dario Faggioli

All dumping functions invoked by the "runq" keyhandler are called with
disabled interrupts, so there is no need to use the irqsave variants
of any locks in those functions.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched/credit.c  | 10 ++++------
 xen/common/sched/credit2.c |  5 ++---
 xen/common/sched/null.c    | 10 ++++------
 xen/common/sched/rt.c      | 10 ++++------
 4 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/xen/common/sched/credit.c b/xen/common/sched/credit.c
index 05946eea6e..dee87e7fe2 100644
--- a/xen/common/sched/credit.c
+++ b/xen/common/sched/credit.c
@@ -2048,7 +2048,6 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
     const struct csched_pcpu *spc;
     const struct csched_unit *svc;
     spinlock_t *lock;
-    unsigned long flags;
     int loop;
 
     /*
@@ -2058,7 +2057,7 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
      * - we scan through the runqueue, so we need the proper runqueue
      *   lock (the one of the runqueue of this cpu).
      */
-    spin_lock_irqsave(&prv->lock, flags);
+    spin_lock(&prv->lock);
     lock = pcpu_schedule_lock(cpu);
 
     spc = CSCHED_PCPU(cpu);
@@ -2089,7 +2088,7 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
     }
 
     pcpu_schedule_unlock(lock, cpu);
-    spin_unlock_irqrestore(&prv->lock, flags);
+    spin_unlock(&prv->lock);
 }
 
 static void
@@ -2098,9 +2097,8 @@ csched_dump(const struct scheduler *ops)
     struct list_head *iter_sdom, *iter_svc;
     struct csched_private *prv = CSCHED_PRIV(ops);
     int loop;
-    unsigned long flags;
 
-    spin_lock_irqsave(&prv->lock, flags);
+    spin_lock(&prv->lock);
 
     printk("info:\n"
            "\tncpus              = %u\n"
@@ -2153,7 +2151,7 @@ csched_dump(const struct scheduler *ops)
         }
     }
 
-    spin_unlock_irqrestore(&prv->lock, flags);
+    spin_unlock(&prv->lock);
 }
 
 static int __init
diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index f2752f27e2..e76d2ed543 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -3649,14 +3649,13 @@ csched2_dump(const struct scheduler *ops)
 {
     struct list_head *iter_sdom;
     struct csched2_private *prv = csched2_priv(ops);
-    unsigned long flags;
     unsigned int i, j, loop;
 
     /*
      * We need the private scheduler lock as we access global
      * scheduler data and (below) the list of active domains.
      */
-    read_lock_irqsave(&prv->lock, flags);
+    read_lock(&prv->lock);
 
     printk("Active queues: %d\n"
            "\tdefault-weight     = %d\n",
@@ -3749,7 +3748,7 @@ csched2_dump(const struct scheduler *ops)
         spin_unlock(&rqd->lock);
     }
 
-    read_unlock_irqrestore(&prv->lock, flags);
+    read_unlock(&prv->lock);
 }
 
 static void *
diff --git a/xen/common/sched/null.c b/xen/common/sched/null.c
index 8c3101649d..3b31703d7e 100644
--- a/xen/common/sched/null.c
+++ b/xen/common/sched/null.c
@@ -954,9 +954,8 @@ static void null_dump_pcpu(const struct scheduler *ops, int cpu)
     const struct null_pcpu *npc = get_sched_res(cpu)->sched_priv;
     const struct null_unit *nvc;
     spinlock_t *lock;
-    unsigned long flags;
 
-    lock = pcpu_schedule_lock_irqsave(cpu, &flags);
+    lock = pcpu_schedule_lock(cpu);
 
     printk("CPU[%02d] sibling={%*pbl}, core={%*pbl}",
            cpu, CPUMASK_PR(per_cpu(cpu_sibling_mask, cpu)),
@@ -974,17 +973,16 @@ static void null_dump_pcpu(const struct scheduler *ops, int cpu)
         printk("\n");
     }
 
-    pcpu_schedule_unlock_irqrestore(lock, flags, cpu);
+    pcpu_schedule_unlock(lock, cpu);
 }
 
 static void null_dump(const struct scheduler *ops)
 {
     struct null_private *prv = null_priv(ops);
     struct list_head *iter;
-    unsigned long flags;
     unsigned int loop;
 
-    spin_lock_irqsave(&prv->lock, flags);
+    spin_lock(&prv->lock);
 
     printk("\tcpus_free = %*pbl\n", CPUMASK_PR(&prv->cpus_free));
 
@@ -1029,7 +1027,7 @@ static void null_dump(const struct scheduler *ops)
     printk("\n");
     spin_unlock(&prv->waitq_lock);
 
-    spin_unlock_irqrestore(&prv->lock, flags);
+    spin_unlock(&prv->lock);
 }
 
 static const struct scheduler sched_null_def = {
diff --git a/xen/common/sched/rt.c b/xen/common/sched/rt.c
index 66585ed50a..16379cb2d2 100644
--- a/xen/common/sched/rt.c
+++ b/xen/common/sched/rt.c
@@ -353,9 +353,8 @@ rt_dump_pcpu(const struct scheduler *ops, int cpu)
 {
     struct rt_private *prv = rt_priv(ops);
     const struct rt_unit *svc;
-    unsigned long flags;
 
-    spin_lock_irqsave(&prv->lock, flags);
+    spin_lock(&prv->lock);
     printk("CPU[%02d]\n", cpu);
     /* current UNIT (nothing to say if that's the idle unit). */
     svc = rt_unit(curr_on_cpu(cpu));
@@ -363,7 +362,7 @@ rt_dump_pcpu(const struct scheduler *ops, int cpu)
     {
         rt_dump_unit(ops, svc);
     }
-    spin_unlock_irqrestore(&prv->lock, flags);
+    spin_unlock(&prv->lock);
 }
 
 static void
@@ -373,9 +372,8 @@ rt_dump(const struct scheduler *ops)
     struct rt_private *prv = rt_priv(ops);
     const struct rt_unit *svc;
     const struct rt_dom *sdom;
-    unsigned long flags;
 
-    spin_lock_irqsave(&prv->lock, flags);
+    spin_lock(&prv->lock);
 
     if ( list_empty(&prv->sdom) )
         goto out;
@@ -421,7 +419,7 @@ rt_dump(const struct scheduler *ops)
     }
 
  out:
-    spin_unlock_irqrestore(&prv->lock, flags);
+    spin_unlock(&prv->lock);
 }
 
 /*
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Xen-devel] [PATCH 4/8] xen: add locks with timeouts for keyhandlers
  2020-02-13 12:54 [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks Juergen Gross
                   ` (2 preceding siblings ...)
  2020-02-13 12:54 ` [Xen-devel] [PATCH 3/8] xen/sched: don't use irqsave locks in dumping functions Juergen Gross
@ 2020-02-13 12:54 ` Juergen Gross
  2020-03-05 15:25   ` Jan Beulich
  2020-02-13 12:54 ` [Xen-devel] [PATCH 5/8] xen/sched: use keyhandler locks when dumping data to console Juergen Gross
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Juergen Gross @ 2020-02-13 12:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Julien Grall, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Jan Beulich, Roger Pau Monné

Most keyhandlers are used to dump hypervisor data to the console and
they are used mostly for debugging purposes. In those cases it might
happen that some data structures are locked and thus are blocking the
handler to access the data.

In order to be able to still get some information don't use plain
locking functions in the keyhandlers, but a variant of trylocks with
a timeout value. This allows to wait for some time and to give up in
case the lock was not obtained.

Add the main infrastructure for this feature including a new runtime
parameter allowing to specify the timeout value in milliseconds.

Use the new locking scheme in the handlers defined in keyhandler.c.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 docs/misc/xen-command-line.pandoc |  9 +++++++++
 xen/arch/x86/domain.c             |  9 +++++++--
 xen/common/keyhandler.c           | 29 ++++++++++++++++++++++++++++-
 xen/common/rangeset.c             |  7 +++++--
 xen/include/xen/keyhandler.h      | 26 ++++++++++++++++++++++++++
 5 files changed, 75 insertions(+), 5 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 5051583a5d..ee3d031771 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1384,6 +1384,15 @@ Force the use of `[<seg>:]<bus>:<device>.<func>` as device ID of IO-APIC
 `<ioapic>` instead of the one specified by the IVHD sub-tables of the IVRS
 ACPI table.
 
+### keyhandler-lock-timeout
+> `= <integer>`
+
+> Default: `1`
+
+> Can be modified at runtime
+
+Specify the lock timeout of keyhandlers in milliseconds.
+
 ### lapic (x86)
 > `= <boolean>`
 
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index f53ae5ff86..1d09911dc0 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -22,6 +22,7 @@
 #include <xen/grant_table.h>
 #include <xen/iocap.h>
 #include <xen/kernel.h>
+#include <xen/keyhandler.h>
 #include <xen/hypercall.h>
 #include <xen/multicall.h>
 #include <xen/irq.h>
@@ -222,7 +223,8 @@ void dump_pageframe_info(struct domain *d)
     {
         printk("    DomPage list too long to display\n");
     }
-    else
+    else if ( keyhandler_spin_lock(&d->page_alloc_lock,
+                                   "could not read page_list") )
     {
         unsigned long total[MASK_EXTR(PGT_type_mask, PGT_type_mask) + 1] = {};
 
@@ -251,7 +253,10 @@ void dump_pageframe_info(struct domain *d)
     if ( is_hvm_domain(d) )
         p2m_pod_dump_data(d);
 
-    spin_lock(&d->page_alloc_lock);
+    if ( !keyhandler_spin_lock(&d->page_alloc_lock,
+                               "could not read page_list") )
+        return;
+
     page_list_for_each ( page, &d->xenpage_list )
     {
         printk("    XenPage %p: caf=%08lx, taf=%" PRtype_info "\n",
diff --git a/xen/common/keyhandler.c b/xen/common/keyhandler.c
index f50490d0f3..c393d83b70 100644
--- a/xen/common/keyhandler.c
+++ b/xen/common/keyhandler.c
@@ -14,8 +14,10 @@
 #include <xen/rangeset.h>
 #include <xen/compat.h>
 #include <xen/ctype.h>
+#include <xen/param.h>
 #include <xen/perfc.h>
 #include <xen/mm.h>
+#include <xen/time.h>
 #include <xen/watchdog.h>
 #include <xen/init.h>
 #include <asm/debugger.h>
@@ -71,6 +73,30 @@ static struct keyhandler {
 #undef KEYHANDLER
 };
 
+static unsigned int lock_timeout = 1;
+integer_runtime_param("keyhandler-lock-timeout", lock_timeout);
+
+s_time_t keyhandler_lock_timeout(void)
+{
+    return NOW() + MILLISECS(lock_timeout);
+}
+
+bool keyhandler_spin_lock(spinlock_t *lock, const char *msg)
+{
+    keyhandler_lock_body(bool, spin_trylock(lock), "%s\n", msg);
+}
+
+bool keyhandler_spin_lock_irqsave(spinlock_t *lock, unsigned long *flags,
+                                  const char *msg)
+{
+    keyhandler_lock_body(bool, spin_trylock_irqsave(lock, *flags), "%s\n", msg);
+}
+
+bool keyhandler_read_lock(rwlock_t *lock, const char *msg)
+{
+    keyhandler_lock_body(bool, read_trylock(lock), "%s\n", msg);
+}
+
 static void keypress_action(void *unused)
 {
     handle_keypress(keypress_key, NULL);
@@ -378,7 +404,8 @@ static void read_clocks(unsigned char key)
     static u32 count = 0;
     static DEFINE_SPINLOCK(lock);
 
-    spin_lock(&lock);
+    if ( !keyhandler_spin_lock(&lock, "could not read clock stats") )
+        return;
 
     smp_call_function(read_clocks_slave, NULL, 0);
 
diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index 4ebba30ba3..97104abb45 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -9,6 +9,7 @@
 
 #include <xen/sched.h>
 #include <xen/errno.h>
+#include <xen/keyhandler.h>
 #include <xen/rangeset.h>
 #include <xsm/xsm.h>
 
@@ -546,7 +547,8 @@ static void rangeset_printk(struct rangeset *r)
     int nr_printed = 0;
     struct range *x;
 
-    read_lock(&r->lock);
+    if ( !keyhandler_read_lock(&r->lock, "could not read rangeset") )
+        return;
 
     printk("%-10s {", r->name);
 
@@ -575,7 +577,8 @@ void rangeset_domain_printk(
 
     printk("Rangesets belonging to domain %u:\n", d->domain_id);
 
-    spin_lock(&d->rangesets_lock);
+    if ( !keyhandler_spin_lock(&d->rangesets_lock, "could not get rangesets") )
+        return;
 
     if ( list_empty(&d->rangesets) )
         printk("    None\n");
diff --git a/xen/include/xen/keyhandler.h b/xen/include/xen/keyhandler.h
index 5131e86cbc..cc8e0b18f5 100644
--- a/xen/include/xen/keyhandler.h
+++ b/xen/include/xen/keyhandler.h
@@ -10,6 +10,9 @@
 #ifndef __XEN_KEYHANDLER_H__
 #define __XEN_KEYHANDLER_H__
 
+#include <xen/rwlock.h>
+#include <xen/spinlock.h>
+#include <xen/time.h>
 #include <xen/types.h>
 
 /*
@@ -48,4 +51,27 @@ void register_irq_keyhandler(unsigned char key,
 /* Inject a keypress into the key-handling subsystem. */
 extern void handle_keypress(unsigned char key, struct cpu_user_regs *regs);
 
+/* Locking primitives for inside keyhandlers (like trylock). */
+bool keyhandler_spin_lock(spinlock_t *lock, const char *msg);
+bool keyhandler_spin_lock_irqsave(spinlock_t *lock, unsigned long *flags,
+                                  const char *msg);
+bool keyhandler_read_lock(rwlock_t *lock, const char *msg);
+
+/* Primitives for custom keyhandler lock functions. */
+s_time_t keyhandler_lock_timeout(void);
+#define keyhandler_lock_body(type, lockfunc, arg...) \
+    s_time_t end = keyhandler_lock_timeout();        \
+    type ret;                                        \
+                                                     \
+    do {                                             \
+        ret = lockfunc;                              \
+        if ( ret )                                   \
+            return ret;                              \
+        cpu_relax();                                 \
+    } while ( NOW() < end );                         \
+                                                     \
+    printk("-->lock conflict: " arg);                \
+                                                     \
+    return ret
+
 #endif /* __XEN_KEYHANDLER_H__ */
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Xen-devel] [PATCH 5/8] xen/sched: use keyhandler locks when dumping data to console
  2020-02-13 12:54 [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks Juergen Gross
                   ` (3 preceding siblings ...)
  2020-02-13 12:54 ` [Xen-devel] [PATCH 4/8] xen: add locks with timeouts for keyhandlers Juergen Gross
@ 2020-02-13 12:54 ` Juergen Gross
  2020-02-19 14:31   ` Dario Faggioli
  2020-02-13 12:54 ` [Xen-devel] [PATCH 6/8] xen/common: " Juergen Gross
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Juergen Gross @ 2020-02-13 12:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap, Meng Xu, Dario Faggioli

Instead of using the normal locks use the keyhandler provided trylocks
with timeouts. This requires a special primitive for the scheduler
lock.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched/core.c    |  7 +++++++
 xen/common/sched/cpupool.c |  4 +++-
 xen/common/sched/credit.c  | 25 ++++++++++++++++++-------
 xen/common/sched/credit2.c | 17 +++++++++++------
 xen/common/sched/null.c    | 42 +++++++++++++++++++++++++-----------------
 xen/common/sched/private.h |  1 +
 xen/common/sched/rt.c      |  7 +++++--
 7 files changed, 70 insertions(+), 33 deletions(-)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index d4e8944e0e..7b8b0fe80e 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -21,6 +21,7 @@
 #include <xen/domain.h>
 #include <xen/delay.h>
 #include <xen/event.h>
+#include <xen/keyhandler.h>
 #include <xen/time.h>
 #include <xen/timer.h>
 #include <xen/perfc.h>
@@ -3302,6 +3303,12 @@ void __init sched_setup_dom0_vcpus(struct domain *d)
 }
 #endif
 
+spinlock_t *keyhandler_pcpu_lock(unsigned int cpu)
+{
+    keyhandler_lock_body(spinlock_t *, pcpu_schedule_trylock(cpu),
+                         "could not get pcpu lock, cpu=%u\n", cpu);
+}
+
 #ifdef CONFIG_COMPAT
 #include "compat.c"
 #endif
diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c
index 476916c6ea..5c181e9772 100644
--- a/xen/common/sched/cpupool.c
+++ b/xen/common/sched/cpupool.c
@@ -893,7 +893,9 @@ void dump_runq(unsigned char key)
     s_time_t         now = NOW();
     struct cpupool **c;
 
-    spin_lock(&cpupool_lock);
+    if ( !keyhandler_spin_lock(&cpupool_lock, "could not get cpupools") )
+        return;
+
     local_irq_save(flags);
 
     printk("sched_smt_power_savings: %s\n",
diff --git a/xen/common/sched/credit.c b/xen/common/sched/credit.c
index dee87e7fe2..165ff26bb8 100644
--- a/xen/common/sched/credit.c
+++ b/xen/common/sched/credit.c
@@ -2057,8 +2057,15 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
      * - we scan through the runqueue, so we need the proper runqueue
      *   lock (the one of the runqueue of this cpu).
      */
-    spin_lock(&prv->lock);
-    lock = pcpu_schedule_lock(cpu);
+    if ( !keyhandler_spin_lock(&prv->lock, "could not get credit data") )
+        return;
+
+    lock = keyhandler_pcpu_lock(cpu);
+    if ( !lock )
+    {
+        spin_unlock(&prv->lock);
+        return;
+    }
 
     spc = CSCHED_PCPU(cpu);
     runq = &spc->runq;
@@ -2098,7 +2105,8 @@ csched_dump(const struct scheduler *ops)
     struct csched_private *prv = CSCHED_PRIV(ops);
     int loop;
 
-    spin_lock(&prv->lock);
+    if ( !keyhandler_spin_lock(&prv->lock, "could not get credit data") )
+        return;
 
     printk("info:\n"
            "\tncpus              = %u\n"
@@ -2142,12 +2150,15 @@ csched_dump(const struct scheduler *ops)
             spinlock_t *lock;
 
             svc = list_entry(iter_svc, struct csched_unit, active_unit_elem);
-            lock = unit_schedule_lock(svc->unit);
+            lock = keyhandler_pcpu_lock(svc->unit->res->master_cpu);
 
-            printk("\t%3d: ", ++loop);
-            csched_dump_unit(svc);
+            if ( lock )
+            {
+                printk("\t%3d: ", ++loop);
+                csched_dump_unit(svc);
 
-            unit_schedule_unlock(lock, svc->unit);
+                pcpu_schedule_unlock(lock, svc->unit->res->master_cpu);
+            }
         }
     }
 
diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index e76d2ed543..28b03fe744 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -3655,7 +3655,8 @@ csched2_dump(const struct scheduler *ops)
      * We need the private scheduler lock as we access global
      * scheduler data and (below) the list of active domains.
      */
-    read_lock(&prv->lock);
+    if ( !keyhandler_read_lock(&prv->lock, "could not get credit2 data") )
+        return;
 
     printk("Active queues: %d\n"
            "\tdefault-weight     = %d\n",
@@ -3711,12 +3712,15 @@ csched2_dump(const struct scheduler *ops)
             struct csched2_unit * const svc = csched2_unit(unit);
             spinlock_t *lock;
 
-            lock = unit_schedule_lock(unit);
+            lock = keyhandler_pcpu_lock(unit->res->master_cpu);
 
-            printk("\t%3d: ", ++loop);
-            csched2_dump_unit(prv, svc);
+            if ( lock )
+            {
+                printk("\t%3d: ", ++loop);
+                csched2_dump_unit(prv, svc);
 
-            unit_schedule_unlock(lock, unit);
+                pcpu_schedule_unlock(lock, unit->res->master_cpu);
+            }
         }
     }
 
@@ -3727,7 +3731,8 @@ csched2_dump(const struct scheduler *ops)
         int loop = 0;
 
         /* We need the lock to scan the runqueue. */
-        spin_lock(&rqd->lock);
+        if ( !keyhandler_spin_lock(&rqd->lock, "could not get runq") )
+            continue;
 
         printk("Runqueue %d:\n", i);
 
diff --git a/xen/common/sched/null.c b/xen/common/sched/null.c
index 3b31703d7e..fe59ce17fe 100644
--- a/xen/common/sched/null.c
+++ b/xen/common/sched/null.c
@@ -28,6 +28,7 @@
  * if the scheduler is used inside a cpupool.
  */
 
+#include <xen/keyhandler.h>
 #include <xen/sched.h>
 #include <xen/softirq.h>
 #include <xen/trace.h>
@@ -982,7 +983,8 @@ static void null_dump(const struct scheduler *ops)
     struct list_head *iter;
     unsigned int loop;
 
-    spin_lock(&prv->lock);
+    if ( !keyhandler_spin_lock(&prv->lock, "could not get null data") )
+        return;
 
     printk("\tcpus_free = %*pbl\n", CPUMASK_PR(&prv->cpus_free));
 
@@ -1001,31 +1003,37 @@ static void null_dump(const struct scheduler *ops)
             struct null_unit * const nvc = null_unit(unit);
             spinlock_t *lock;
 
-            lock = unit_schedule_lock(unit);
+            lock = keyhandler_pcpu_lock(unit->res->master_cpu);
 
-            printk("\t%3d: ", ++loop);
-            dump_unit(prv, nvc);
-            printk("\n");
+            if ( lock )
+            {
+                printk("\t%3d: ", ++loop);
+                dump_unit(prv, nvc);
+                printk("\n");
 
-            unit_schedule_unlock(lock, unit);
+                pcpu_schedule_unlock(lock, unit->res->master_cpu);
+            }
         }
     }
 
     printk("Waitqueue: ");
     loop = 0;
-    spin_lock(&prv->waitq_lock);
-    list_for_each( iter, &prv->waitq )
+    if ( keyhandler_spin_lock(&prv->waitq_lock, "could not get waitq") )
     {
-        struct null_unit *nvc = list_entry(iter, struct null_unit, waitq_elem);
-
-        if ( loop++ != 0 )
-            printk(", ");
-        if ( loop % 24 == 0 )
-            printk("\n\t");
-        printk("%pdv%d", nvc->unit->domain, nvc->unit->unit_id);
+        list_for_each( iter, &prv->waitq )
+        {
+            struct null_unit *nvc = list_entry(iter, struct null_unit,
+                                               waitq_elem);
+
+            if ( loop++ != 0 )
+                printk(", ");
+            if ( loop % 24 == 0 )
+                printk("\n\t");
+            printk("%pdv%d", nvc->unit->domain, nvc->unit->unit_id);
+        }
+        printk("\n");
+        spin_unlock(&prv->waitq_lock);
     }
-    printk("\n");
-    spin_unlock(&prv->waitq_lock);
 
     spin_unlock(&prv->lock);
 }
diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
index 2a94179baa..6723f74d28 100644
--- a/xen/common/sched/private.h
+++ b/xen/common/sched/private.h
@@ -631,5 +631,6 @@ struct cpupool *cpupool_get_by_id(int poolid);
 void cpupool_put(struct cpupool *pool);
 int cpupool_add_domain(struct domain *d, int poolid);
 void cpupool_rm_domain(struct domain *d);
+spinlock_t *keyhandler_pcpu_lock(unsigned int cpu);
 
 #endif /* __XEN_SCHED_IF_H__ */
diff --git a/xen/common/sched/rt.c b/xen/common/sched/rt.c
index 16379cb2d2..d4b17e0f8b 100644
--- a/xen/common/sched/rt.c
+++ b/xen/common/sched/rt.c
@@ -354,7 +354,9 @@ rt_dump_pcpu(const struct scheduler *ops, int cpu)
     struct rt_private *prv = rt_priv(ops);
     const struct rt_unit *svc;
 
-    spin_lock(&prv->lock);
+    if ( !keyhandler_spin_lock(&prv->lock, "could not get rt data") )
+        return;
+
     printk("CPU[%02d]\n", cpu);
     /* current UNIT (nothing to say if that's the idle unit). */
     svc = rt_unit(curr_on_cpu(cpu));
@@ -373,7 +375,8 @@ rt_dump(const struct scheduler *ops)
     const struct rt_unit *svc;
     const struct rt_dom *sdom;
 
-    spin_lock(&prv->lock);
+    if ( !keyhandler_spin_lock(&prv->lock, "could not get rt data") )
+        return;
 
     if ( list_empty(&prv->sdom) )
         goto out;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Xen-devel] [PATCH 6/8] xen/common: use keyhandler locks when dumping data to console
  2020-02-13 12:54 [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks Juergen Gross
                   ` (4 preceding siblings ...)
  2020-02-13 12:54 ` [Xen-devel] [PATCH 5/8] xen/sched: use keyhandler locks when dumping data to console Juergen Gross
@ 2020-02-13 12:54 ` Juergen Gross
  2020-02-13 12:54 ` [Xen-devel] [PATCH 7/8] xen/drivers: " Juergen Gross
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Juergen Gross @ 2020-02-13 12:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Julien Grall, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Ross Lagerwall, Jan Beulich

Instead of using the normal locks use the keyhandler provided trylocks
with timeouts. This requires adding a percpu read_trylock and a special
primitive for the grant lock.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/event_channel.c |  3 ++-
 xen/common/grant_table.c   | 32 +++++++++++++++++++++++++++++---
 xen/common/livepatch.c     | 11 +++--------
 xen/common/spinlock.c      | 18 +++++++++++++++---
 xen/common/timer.c         | 15 +++++++++------
 xen/include/xen/rwlock.h   | 37 +++++++++++++++++++++++++++++++++++++
 6 files changed, 95 insertions(+), 21 deletions(-)

diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index e86e2bfab0..a8fd481cb8 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1387,7 +1387,8 @@ static void domain_dump_evtchn_info(struct domain *d)
            "Polling vCPUs: {%*pbl}\n"
            "    port [p/m/s]\n", d->domain_id, d->max_vcpus, d->poll_mask);
 
-    spin_lock(&d->event_lock);
+    if ( !keyhandler_spin_lock(&d->event_lock, "could not get event lock") )
+        return;
 
     for ( port = 1; port < d->max_evtchns; ++port )
     {
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index c793927cd6..14d01950ab 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -335,6 +335,11 @@ static inline void grant_read_lock(struct grant_table *gt)
     percpu_read_lock(grant_rwlock, &gt->lock);
 }
 
+static inline int grant_read_trylock(struct grant_table *gt)
+{
+    return percpu_read_trylock(grant_rwlock, &gt->lock);
+}
+
 static inline void grant_read_unlock(struct grant_table *gt)
 {
     percpu_read_unlock(grant_rwlock, &gt->lock);
@@ -4040,6 +4045,24 @@ int gnttab_get_status_frame(struct domain *d, unsigned long idx,
     return rc;
 }
 
+static int keyhandler_grant_read_lock(struct domain *d)
+{
+    keyhandler_lock_body(int, grant_read_trylock(d->grant_table),
+                         "could not get grant lock for %pd\n", d);
+}
+
+static inline struct active_grant_entry *
+keyhandler_active_entry_acquire(struct grant_table *t, grant_ref_t e)
+{
+    struct active_grant_entry *act;
+
+    act = &_active_entry(t, e);
+    if ( !keyhandler_spin_lock(&act->lock, "could not acquire active entry") )
+        return NULL;
+
+    return act;
+}
+
 static void gnttab_usage_print(struct domain *rd)
 {
     int first = 1;
@@ -4047,11 +4070,12 @@ static void gnttab_usage_print(struct domain *rd)
     struct grant_table *gt = rd->grant_table;
     unsigned int nr_ents;
 
+    if ( !keyhandler_grant_read_lock(rd) )
+        return;
+
     printk("      -------- active --------       -------- shared --------\n");
     printk("[ref] localdom mfn      pin          localdom gmfn     flags\n");
 
-    grant_read_lock(gt);
-
     printk("grant-table for remote d%d (v%u)\n"
            "  %u frames (%u max), %u maptrack frames (%u max)\n",
            rd->domain_id, gt->gt_version,
@@ -4066,7 +4090,9 @@ static void gnttab_usage_print(struct domain *rd)
         uint16_t status;
         uint64_t frame;
 
-        act = active_entry_acquire(gt, ref);
+        act = keyhandler_active_entry_acquire(gt, ref);
+        if ( !act )
+            continue;
         if ( !act->pin )
         {
             active_entry_release(act);
diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
index 5e09dc990b..0f0a877704 100644
--- a/xen/common/livepatch.c
+++ b/xen/common/livepatch.c
@@ -2072,11 +2072,8 @@ static void livepatch_printall(unsigned char key)
     if ( !xen_build_id(&binary_id, &len) )
         printk("build-id: %*phN\n", len, binary_id);
 
-    if ( !spin_trylock(&payload_lock) )
-    {
-        printk("Lock held. Try again.\n");
+    if ( !keyhandler_spin_lock(&payload_lock, "could not get payload lock") )
         return;
-    }
 
     list_for_each_entry ( data, &payload_list, list )
     {
@@ -2096,11 +2093,9 @@ static void livepatch_printall(unsigned char key)
             {
                 spin_unlock(&payload_lock);
                 process_pending_softirqs();
-                if ( !spin_trylock(&payload_lock) )
-                {
-                    printk("Couldn't reacquire lock. Try again.\n");
+                if ( !keyhandler_spin_lock(&payload_lock,
+                                           "could not reacquire payload lock") )
                     return;
-                }
             }
         }
         if ( data->id.len )
diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
index 344981c54a..3204d24dfa 100644
--- a/xen/common/spinlock.c
+++ b/xen/common/spinlock.c
@@ -349,17 +349,23 @@ static struct lock_profile_anc lock_profile_ancs[LOCKPROF_TYPE_N];
 static struct lock_profile_qhead lock_profile_glb_q;
 static spinlock_t lock_profile_lock = SPIN_LOCK_UNLOCKED;
 
-static void spinlock_profile_iterate(lock_profile_subfunc *sub, void *par)
+static void spinlock_profile_iterate_locked(lock_profile_subfunc *sub,
+                                            void *par)
 {
     int i;
     struct lock_profile_qhead *hq;
     struct lock_profile *eq;
 
-    spin_lock(&lock_profile_lock);
     for ( i = 0; i < LOCKPROF_TYPE_N; i++ )
         for ( hq = lock_profile_ancs[i].head_q; hq; hq = hq->head_q )
             for ( eq = hq->elem_q; eq; eq = eq->next )
                 sub(eq, i, hq->idx, par);
+}
+
+static void spinlock_profile_iterate(lock_profile_subfunc *sub, void *par)
+{
+    spin_lock(&lock_profile_lock);
+    spinlock_profile_iterate_locked(sub, par);
     spin_unlock(&lock_profile_lock);
 }
 
@@ -389,7 +395,13 @@ void spinlock_profile_printall(unsigned char key)
     diff = now - lock_profile_start;
     printk("Xen lock profile info SHOW  (now = %"PRI_stime" total = "
            "%"PRI_stime")\n", now, diff);
-    spinlock_profile_iterate(spinlock_profile_print_elem, NULL);
+
+    if ( !keyhandler_spin_lock(&lock_profile_lock, "could not get lock") )
+        return;
+
+    spinlock_profile_iterate_locked(spinlock_profile_print_elem, NULL);
+
+    spin_unlock(&lock_profile_lock);
 }
 
 static void spinlock_profile_reset_elem(struct lock_profile *data,
diff --git a/xen/common/timer.c b/xen/common/timer.c
index 1bb265ceea..0a00857e2d 100644
--- a/xen/common/timer.c
+++ b/xen/common/timer.c
@@ -561,12 +561,15 @@ static void dump_timerq(unsigned char key)
         ts = &per_cpu(timers, i);
 
         printk("CPU%02d:\n", i);
-        spin_lock_irqsave(&ts->lock, flags);
-        for ( j = 1; j <= heap_metadata(ts->heap)->size; j++ )
-            dump_timer(ts->heap[j], now);
-        for ( t = ts->list; t != NULL; t = t->list_next )
-            dump_timer(t, now);
-        spin_unlock_irqrestore(&ts->lock, flags);
+        if ( keyhandler_spin_lock_irqsave(&ts->lock, &flags,
+                                          "could not get lock") )
+        {
+            for ( j = 1; j <= heap_metadata(ts->heap)->size; j++ )
+                dump_timer(ts->heap[j], now);
+            for ( t = ts->list; t != NULL; t = t->list_next )
+                dump_timer(t, now);
+            spin_unlock_irqrestore(&ts->lock, flags);
+        }
     }
 }
 
diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h
index 3dfea1ac2a..add8577429 100644
--- a/xen/include/xen/rwlock.h
+++ b/xen/include/xen/rwlock.h
@@ -278,6 +278,41 @@ static inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
     }
 }
 
+static inline int _percpu_read_trylock(percpu_rwlock_t **per_cpudata,
+                                         percpu_rwlock_t *percpu_rwlock)
+{
+    /* Validate the correct per_cpudata variable has been provided. */
+    _percpu_rwlock_owner_check(per_cpudata, percpu_rwlock);
+
+    /* We cannot support recursion on the same lock. */
+    ASSERT(this_cpu_ptr(per_cpudata) != percpu_rwlock);
+    /*
+     * Detect using a second percpu_rwlock_t simulatenously and fallback
+     * to standard read_trylock.
+     */
+    if ( unlikely(this_cpu_ptr(per_cpudata) != NULL ) )
+        return read_trylock(&percpu_rwlock->rwlock);
+
+    /* Indicate this cpu is reading. */
+    this_cpu_ptr(per_cpudata) = percpu_rwlock;
+    smp_mb();
+    /* Check if a writer is waiting. */
+    if ( unlikely(percpu_rwlock->writer_activating) )
+    {
+        /* Let the waiting writer know we aren't holding the lock. */
+        this_cpu_ptr(per_cpudata) = NULL;
+        /* Try using the read lock to keep the lock fair. */
+        if ( !read_trylock(&percpu_rwlock->rwlock) )
+            return 0;
+        /* Set the per CPU data again and continue. */
+        this_cpu_ptr(per_cpudata) = percpu_rwlock;
+        /* Drop the read lock because we don't need it anymore. */
+        read_unlock(&percpu_rwlock->rwlock);
+    }
+
+    return 1;
+}
+
 static inline void _percpu_read_unlock(percpu_rwlock_t **per_cpudata,
                 percpu_rwlock_t *percpu_rwlock)
 {
@@ -318,6 +353,8 @@ static inline void _percpu_write_unlock(percpu_rwlock_t **per_cpudata,
 
 #define percpu_read_lock(percpu, lock) \
     _percpu_read_lock(&get_per_cpu_var(percpu), lock)
+#define percpu_read_trylock(percpu, lock) \
+    _percpu_read_trylock(&get_per_cpu_var(percpu), lock)
 #define percpu_read_unlock(percpu, lock) \
     _percpu_read_unlock(&get_per_cpu_var(percpu), lock)
 #define percpu_write_lock(percpu, lock) \
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Xen-devel] [PATCH 7/8] xen/drivers: use keyhandler locks when dumping data to console
  2020-02-13 12:54 [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks Juergen Gross
                   ` (5 preceding siblings ...)
  2020-02-13 12:54 ` [Xen-devel] [PATCH 6/8] xen/common: " Juergen Gross
@ 2020-02-13 12:54 ` Juergen Gross
  2020-02-13 12:54 ` [Xen-devel] [PATCH 8/8] xen/x86: " Juergen Gross
  2020-02-13 18:38 ` [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks Andrew Cooper
  8 siblings, 0 replies; 26+ messages in thread
From: Juergen Gross @ 2020-02-13 12:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Julien Grall, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Jan Beulich

Instead of using the normal locks use the keyhandler provided trylocks
with timeouts. This requires adding a special primitive for the pcidev
lock.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/drivers/passthrough/amd/iommu_intr.c | 14 ++++++++++----
 xen/drivers/passthrough/pci.c            | 14 +++++++++++---
 xen/drivers/vpci/msi.c                   |  5 ++++-
 3 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu_intr.c b/xen/drivers/passthrough/amd/iommu_intr.c
index e1cc13b873..753aaf3679 100644
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -16,6 +16,7 @@
  * along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <xen/keyhandler.h>
 #include <xen/softirq.h>
 
 #include <asm/io_apic.h>
@@ -886,9 +887,12 @@ static int dump_intremap_mapping(const struct amd_iommu *iommu,
     if ( !ivrs_mapping )
         return 0;
 
-    spin_lock_irqsave(&(ivrs_mapping->intremap_lock), flags);
-    dump_intremap_table(iommu, ivrs_mapping->intremap_table, ivrs_mapping);
-    spin_unlock_irqrestore(&(ivrs_mapping->intremap_lock), flags);
+    if ( keyhandler_spin_lock_irqsave(&(ivrs_mapping->intremap_lock), &flags,
+                                      "could not get intremap lock") )
+    {
+        dump_intremap_table(iommu, ivrs_mapping->intremap_table, ivrs_mapping);
+        spin_unlock_irqrestore(&(ivrs_mapping->intremap_lock), flags);
+    }
 
     process_pending_softirqs();
 
@@ -909,7 +913,9 @@ void amd_iommu_dump_intremap_tables(unsigned char key)
 
         printk("--- Dumping Shared IOMMU Interrupt Remapping Table ---\n");
 
-        spin_lock_irqsave(&shared_intremap_lock, flags);
+        if ( !keyhandler_spin_lock_irqsave(&shared_intremap_lock, &flags,
+                                           "could not get lock") )
+            return;
         dump_intremap_table(list_first_entry(&amd_iommu_head, struct amd_iommu,
                                              list),
                             shared_intremap_table, NULL);
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 5660f7e1c2..1fd998af3a 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1356,12 +1356,20 @@ static int _dump_pci_devices(struct pci_seg *pseg, void *arg)
     return 0;
 }
 
+static bool keyhandler_pcidevs_lock(void)
+{
+    keyhandler_lock_body(bool_t, pcidevs_trylock(),
+                         "could not get pcidevs lock\n");
+}
+
 static void dump_pci_devices(unsigned char ch)
 {
     printk("==== PCI devices ====\n");
-    pcidevs_lock();
-    pci_segments_iterate(_dump_pci_devices, NULL);
-    pcidevs_unlock();
+    if ( keyhandler_pcidevs_lock() )
+    {
+        pci_segments_iterate(_dump_pci_devices, NULL);
+        pcidevs_unlock();
+    }
 }
 
 static int __init setup_dump_pcidevs(void)
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 75010762ed..31ea99b62e 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -16,6 +16,7 @@
  * License along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <xen/keyhandler.h>
 #include <xen/sched.h>
 #include <xen/softirq.h>
 #include <xen/vpci.h>
@@ -283,7 +284,9 @@ void vpci_dump_msi(void)
             const struct vpci_msi *msi;
             const struct vpci_msix *msix;
 
-            if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) )
+            if ( !pdev->vpci ||
+                 !keyhandler_spin_lock(&pdev->vpci->lock,
+                                       "could not get vpci lock") )
                 continue;
 
             msi = pdev->vpci->msi;
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Xen-devel] [PATCH 8/8] xen/x86: use keyhandler locks when dumping data to console
  2020-02-13 12:54 [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks Juergen Gross
                   ` (6 preceding siblings ...)
  2020-02-13 12:54 ` [Xen-devel] [PATCH 7/8] xen/drivers: " Juergen Gross
@ 2020-02-13 12:54 ` Juergen Gross
  2020-02-13 18:38 ` [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks Andrew Cooper
  8 siblings, 0 replies; 26+ messages in thread
From: Juergen Gross @ 2020-02-13 12:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monné

Instead of using the normal locks use the keyhandler provided trylocks
with timeouts.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/io_apic.c | 53 +++++++++++++++++++++++++++++++++++++-------------
 xen/arch/x86/irq.c     |  5 ++++-
 xen/arch/x86/msi.c     |  4 +++-
 xen/arch/x86/numa.c    | 16 +++++++++------
 4 files changed, 57 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index e98e08e9c8..4acdc566b9 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1098,6 +1098,18 @@ static inline void UNEXPECTED_IO_APIC(void)
 {
 }
 
+static bool get_ioapic_lock(unsigned long *flags, bool boot)
+{
+    if ( boot )
+    {
+        spin_lock_irqsave(&ioapic_lock, *flags);
+        return true;
+    }
+
+    return keyhandler_spin_lock_irqsave(&ioapic_lock, flags,
+                                        "could not get ioapic lock");
+}
+
 static void /*__init*/ __print_IO_APIC(bool boot)
 {
     int apic, i;
@@ -1125,13 +1137,16 @@ static void /*__init*/ __print_IO_APIC(bool boot)
         if (!nr_ioapic_entries[apic])
             continue;
 
-	spin_lock_irqsave(&ioapic_lock, flags);
+        if ( !get_ioapic_lock(&flags, boot) )
+                continue;
+
 	reg_00.raw = io_apic_read(apic, 0);
 	reg_01.raw = io_apic_read(apic, 1);
 	if (reg_01.bits.version >= 0x10)
             reg_02.raw = io_apic_read(apic, 2);
 	if (reg_01.bits.version >= 0x20)
             reg_03.raw = io_apic_read(apic, 3);
+
 	spin_unlock_irqrestore(&ioapic_lock, flags);
 
 	printk(KERN_DEBUG "IO APIC #%d......\n", mp_ioapics[apic].mpc_apicid);
@@ -1201,7 +1216,12 @@ static void /*__init*/ __print_IO_APIC(bool boot)
 	for (i = 0; i <= reg_01.bits.entries; i++) {
             struct IO_APIC_route_entry entry;
 
-            entry = ioapic_read_entry(apic, i, 0);
+            if ( !get_ioapic_lock(&flags, boot) )
+                continue;
+
+            entry = __ioapic_read_entry(apic, i, 0);
+
+            spin_unlock_irqrestore(&ioapic_lock, flags);
 
             if ( x2apic_enabled && iommu_intremap )
                 printk(KERN_DEBUG " %02x %08x", i, entry.dest.dest32);
@@ -2495,21 +2515,28 @@ void dump_ioapic_irq_info(void)
 
         for ( ; ; )
         {
+            unsigned long flags;
+
             pin = entry->pin;
 
             printk("      Apic 0x%02x, Pin %2d: ", entry->apic, pin);
 
-            rte = ioapic_read_entry(entry->apic, pin, 0);
-
-            printk("vec=%02x delivery=%-5s dest=%c status=%d "
-                   "polarity=%d irr=%d trig=%c mask=%d dest_id:%0*x\n",
-                   rte.vector, delivery_mode_2_str(rte.delivery_mode),
-                   rte.dest_mode ? 'L' : 'P',
-                   rte.delivery_status, rte.polarity, rte.irr,
-                   rte.trigger ? 'L' : 'E', rte.mask,
-                   (x2apic_enabled && iommu_intremap) ? 8 : 2,
-                   (x2apic_enabled && iommu_intremap) ?
-                       rte.dest.dest32 : rte.dest.logical.logical_dest);
+            if ( keyhandler_spin_lock_irqsave(&ioapic_lock, &flags,
+                                              "could not get ioapic lock") )
+            {
+                rte = __ioapic_read_entry(entry->apic, pin, 0);
+                spin_unlock_irqrestore(&ioapic_lock, flags);
+
+                printk("vec=%02x delivery=%-5s dest=%c status=%d "
+                       "polarity=%d irr=%d trig=%c mask=%d dest_id:%0*x\n",
+                       rte.vector, delivery_mode_2_str(rte.delivery_mode),
+                       rte.dest_mode ? 'L' : 'P',
+                       rte.delivery_status, rte.polarity, rte.irr,
+                       rte.trigger ? 'L' : 'E', rte.mask,
+                       (x2apic_enabled && iommu_intremap) ? 8 : 2,
+                       (x2apic_enabled && iommu_intremap) ?
+                           rte.dest.dest32 : rte.dest.logical.logical_dest);
+            }
 
             if ( entry->next == 0 )
                 break;
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index cc2eb8e925..f3d931b121 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2470,7 +2470,9 @@ static void dump_irqs(unsigned char key)
 
         ssid = in_irq() ? NULL : xsm_show_irq_sid(irq);
 
-        spin_lock_irqsave(&desc->lock, flags);
+        if ( !keyhandler_spin_lock_irqsave(&desc->lock, &flags,
+                                           "could not get irq lock") )
+            goto free_ssid;
 
         printk("   IRQ:%4d vec:%02x %-15s status=%03x aff:{%*pbl}/{%*pbl} ",
                irq, desc->arch.vector, desc->handler->typename, desc->status,
@@ -2506,6 +2508,7 @@ static void dump_irqs(unsigned char key)
 
         spin_unlock_irqrestore(&desc->lock, flags);
 
+ free_ssid:
         xfree(ssid);
     }
 
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index c85cf9f85a..d10b856179 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -1470,7 +1470,9 @@ static void dump_msi(unsigned char key)
         if ( !irq_desc_initialized(desc) )
             continue;
 
-        spin_lock_irqsave(&desc->lock, flags);
+        if ( !keyhandler_spin_lock_irqsave(&desc->lock, &flags,
+                                           "could not get irq lock") )
+            continue;
 
         entry = desc->msi_desc;
         if ( !entry )
diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
index 6ef15b34d5..d21ed8737f 100644
--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -425,18 +425,22 @@ static void dump_numa(unsigned char key)
         for_each_online_node ( i )
             page_num_node[i] = 0;
 
-        spin_lock(&d->page_alloc_lock);
-        page_list_for_each(page, &d->page_list)
+        if ( keyhandler_spin_lock(&d->page_alloc_lock,
+                                  "could not get page_alloc lock") )
         {
-            i = phys_to_nid(page_to_maddr(page));
-            page_num_node[i]++;
+            page_list_for_each(page, &d->page_list)
+            {
+                i = phys_to_nid(page_to_maddr(page));
+                page_num_node[i]++;
+            }
+            spin_unlock(&d->page_alloc_lock);
         }
-        spin_unlock(&d->page_alloc_lock);
 
         for_each_online_node ( i )
             printk("    Node %u: %u\n", i, page_num_node[i]);
 
-        if ( !read_trylock(&d->vnuma_rwlock) )
+        if ( !keyhandler_read_lock(&d->vnuma_rwlock,
+                                   "could not get vnuma lock") )
             continue;
 
         if ( !d->vnuma )
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH 1/8] xen: make rangeset_printk() static
  2020-02-13 12:54 ` [Xen-devel] [PATCH 1/8] xen: make rangeset_printk() static Juergen Gross
@ 2020-02-13 14:00   ` Jan Beulich
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2020-02-13 14:00 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, xen-devel

On 13.02.2020 13:54, Juergen Gross wrote:
> rangeset_printk() is only used locally, so it can be made static.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Acked-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH 2/8] xen: add using domlist_read_lock in keyhandlers
  2020-02-13 12:54 ` [Xen-devel] [PATCH 2/8] xen: add using domlist_read_lock in keyhandlers Juergen Gross
@ 2020-02-13 14:01   ` Jan Beulich
  2020-02-13 14:09   ` George Dunlap
  2020-02-18  5:42   ` Tian, Kevin
  2 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2020-02-13 14:01 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Jun Nakajima, xen-devel, Roger Pau Monné

On 13.02.2020 13:54, Juergen Gross wrote:
> Using for_each_domain() with out holding the domlist_read_lock is
> fragile, so add the lock in the keyhandlers it is missing.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Where applicable
Acked-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH 2/8] xen: add using domlist_read_lock in keyhandlers
  2020-02-13 12:54 ` [Xen-devel] [PATCH 2/8] xen: add using domlist_read_lock in keyhandlers Juergen Gross
  2020-02-13 14:01   ` Jan Beulich
@ 2020-02-13 14:09   ` George Dunlap
  2020-02-18  5:42   ` Tian, Kevin
  2 siblings, 0 replies; 26+ messages in thread
From: George Dunlap @ 2020-02-13 14:09 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Ian Jackson, Jan Beulich, Roger Pau Monné

On 2/13/20 12:54 PM, Juergen Gross wrote:
> Using for_each_domain() with out holding the domlist_read_lock is
> fragile, so add the lock in the keyhandlers it is missing.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  xen/arch/x86/mm/p2m-ept.c       | 4 ++++

p2m bits:

Acked-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks
  2020-02-13 12:54 [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks Juergen Gross
                   ` (7 preceding siblings ...)
  2020-02-13 12:54 ` [Xen-devel] [PATCH 8/8] xen/x86: " Juergen Gross
@ 2020-02-13 18:38 ` Andrew Cooper
  2020-02-14  6:05   ` Jürgen Groß
  2020-02-14  9:37   ` Jan Beulich
  8 siblings, 2 replies; 26+ messages in thread
From: Andrew Cooper @ 2020-02-13 18:38 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson,
	Dario Faggioli, Ross Lagerwall, Meng Xu, Jan Beulich,
	Roger Pau Monné

On 13/02/2020 12:54, Juergen Gross wrote:
> Keyhandlers dumping hypervisor information to the console often need
> to take locks while accessing data. In order to not block in case of
> system inconsistencies it is convenient to use trylock variants when
> obtaining the locks. On the other hand a busy system might easily
> encounter held locks, so this patch series is adding special trylock
> variants with a timeout used by keyhandlers.

This is a backwards step.

Keyhandlers are for debugging purposes.  When debugging it is far more 
important to get the requested data, than almost anything else.

The system will cope with a multi-second outage occurring approximately 
never.  A person debugging who can't get the data has no chance of 
fixing whatever problem they are looking for.

This series seems to be breaking the one critical usecase for 
keyhandlers, to fix what - not let debugging get in the way of the 
smooth running of the system?  A system in need of debugging in the 
first place has bigger problems than needing to run smoothly.

The only thing which should happen to improve system stability is for 
keyhandlers to disable the system watchdog while they are running, in 
case they happen to run for seconds of wallclock time. This is an issue 
which isn't addressed by the series, because once a keyhandler does get 
a lock, it keeps it until it is done.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks
  2020-02-13 18:38 ` [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks Andrew Cooper
@ 2020-02-14  6:05   ` Jürgen Groß
  2020-02-14  9:37   ` Jan Beulich
  1 sibling, 0 replies; 26+ messages in thread
From: Jürgen Groß @ 2020-02-14  6:05 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson,
	Dario Faggioli, Ross Lagerwall, Meng Xu, Jan Beulich,
	Roger Pau Monné

On 13.02.20 19:38, Andrew Cooper wrote:
> On 13/02/2020 12:54, Juergen Gross wrote:
>> Keyhandlers dumping hypervisor information to the console often need
>> to take locks while accessing data. In order to not block in case of
>> system inconsistencies it is convenient to use trylock variants when
>> obtaining the locks. On the other hand a busy system might easily
>> encounter held locks, so this patch series is adding special trylock
>> variants with a timeout used by keyhandlers.
> 
> This is a backwards step.
> 
> Keyhandlers are for debugging purposes.  When debugging it is far more 
> important to get the requested data, than almost anything else.

Right.

> 
> The system will cope with a multi-second outage occurring approximately 
> never.  A person debugging who can't get the data has no chance of 
> fixing whatever problem they are looking for.

Right.

> This series seems to be breaking the one critical usecase for 
> keyhandlers, to fix what - not let debugging get in the way of the 
> smooth running of the system?  A system in need of debugging in the 
> first place has bigger problems than needing to run smoothly.

Okay, this warrants a longer default timeout.

A keyhandler blocking on a lock will produce exactly no further data,
and it will probably block other keyhandlers, too, due to hogging at
least one cpu completely.

With a longer lock timeout (1 second?) there is a much higher chance
that the keyhandler will finish its job producing more data than
without any timeout.

BTW, during development of my core scheduling series I was hit by that
problem multiple times. With the lock timeout I'd have spared dozens of
reboots.

> The only thing which should happen to improve system stability is for 
> keyhandlers to disable the system watchdog while they are running, in 
> case they happen to run for seconds of wallclock time. This is an issue 
> which isn't addressed by the series, because once a keyhandler does get 
> a lock, it keeps it until it is done.

Right, will add disabling the watchdog during keyhandler action.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks
  2020-02-13 18:38 ` [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks Andrew Cooper
  2020-02-14  6:05   ` Jürgen Groß
@ 2020-02-14  9:37   ` Jan Beulich
  2020-02-19 12:14     ` Julien Grall
  1 sibling, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2020-02-14  9:37 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Kevin Tian, Stefano Stabellini, Julien Grall,
	Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson,
	Dario Faggioli, Ross Lagerwall, Meng Xu, Jun Nakajima, xen-devel,
	Roger Pau Monné

On 13.02.2020 19:38, Andrew Cooper wrote:
> On 13/02/2020 12:54, Juergen Gross wrote:
>> Keyhandlers dumping hypervisor information to the console often need
>> to take locks while accessing data. In order to not block in case of
>> system inconsistencies it is convenient to use trylock variants when
>> obtaining the locks. On the other hand a busy system might easily
>> encounter held locks, so this patch series is adding special trylock
>> variants with a timeout used by keyhandlers.
> 
> This is a backwards step.
> 
> Keyhandlers are for debugging purposes.  When debugging it is far more 
> important to get the requested data, than almost anything else.
> 
> The system will cope with a multi-second outage occurring approximately 
> never.  A person debugging who can't get the data has no chance of 
> fixing whatever problem they are looking for.
> 
> This series seems to be breaking the one critical usecase for 
> keyhandlers, to fix what - not let debugging get in the way of the 
> smooth running of the system?  A system in need of debugging in the 
> first place has bigger problems than needing to run smoothly.

I certainly accept what you say further up, but I don't think this
last statement is universally true. There may be a single guest in
trouble, which - to find out about its state - some debugging keys
may want issuing. Disturbing the host and all other guests for this
is not a good idea imo.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH 2/8] xen: add using domlist_read_lock in keyhandlers
  2020-02-13 12:54 ` [Xen-devel] [PATCH 2/8] xen: add using domlist_read_lock in keyhandlers Juergen Gross
  2020-02-13 14:01   ` Jan Beulich
  2020-02-13 14:09   ` George Dunlap
@ 2020-02-18  5:42   ` Tian, Kevin
  2 siblings, 0 replies; 26+ messages in thread
From: Tian, Kevin @ 2020-02-18  5:42 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: Stefano Stabellini, Julien Grall, Nakajima, Jun, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Jan Beulich, Roger Pau Monné

> From: Juergen Gross <jgross@suse.com>
> Sent: Thursday, February 13, 2020 8:55 PM
> 
> Using for_each_domain() with out holding the domlist_read_lock is
> fragile, so add the lock in the keyhandlers it is missing.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks
  2020-02-14  9:37   ` Jan Beulich
@ 2020-02-19 12:14     ` Julien Grall
  0 siblings, 0 replies; 26+ messages in thread
From: Julien Grall @ 2020-02-19 12:14 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: Juergen Gross, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Ian Jackson,
	Dario Faggioli, Ross Lagerwall, Meng Xu, Jun Nakajima, xen-devel,
	Roger Pau Monné

Hi Jan,

On 14/02/2020 09:37, Jan Beulich wrote:
> On 13.02.2020 19:38, Andrew Cooper wrote:
>> On 13/02/2020 12:54, Juergen Gross wrote:
>>> Keyhandlers dumping hypervisor information to the console often need
>>> to take locks while accessing data. In order to not block in case of
>>> system inconsistencies it is convenient to use trylock variants when
>>> obtaining the locks. On the other hand a busy system might easily
>>> encounter held locks, so this patch series is adding special trylock
>>> variants with a timeout used by keyhandlers.
>>
>> This is a backwards step.
>>
>> Keyhandlers are for debugging purposes.  When debugging it is far more
>> important to get the requested data, than almost anything else.
>>
>> The system will cope with a multi-second outage occurring approximately
>> never.  A person debugging who can't get the data has no chance of
>> fixing whatever problem they are looking for.
>>
>> This series seems to be breaking the one critical usecase for
>> keyhandlers, to fix what - not let debugging get in the way of the
>> smooth running of the system?  A system in need of debugging in the
>> first place has bigger problems than needing to run smoothly.
> 
> I certainly accept what you say further up, but I don't think this
> last statement is universally true. There may be a single guest in
> trouble, which - to find out about its state - some debugging keys
> may want issuing. Disturbing the host and all other guests for this
> is not a good idea imo.

This seems to suggest that you only want information for a single guest. 
Therefore using debugging keys was already a bad idea because it will 
disturb all the other guests.

For your setup, it might be worth considering to extend xenctx or 
introduce a way to dump information for a specific domain.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH 3/8] xen/sched: don't use irqsave locks in dumping functions
  2020-02-13 12:54 ` [Xen-devel] [PATCH 3/8] xen/sched: don't use irqsave locks in dumping functions Juergen Gross
@ 2020-02-19 12:40   ` Dario Faggioli
  2020-02-19 14:27   ` Jan Beulich
  1 sibling, 0 replies; 26+ messages in thread
From: Dario Faggioli @ 2020-02-19 12:40 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: George Dunlap, Meng Xu


[-- Attachment #1.1: Type: text/plain, Size: 861 bytes --]

On Thu, 2020-02-13 at 13:54 +0100, Juergen Gross wrote:
> All dumping functions invoked by the "runq" keyhandler are called
> with
> disabled interrupts, so there is no need to use the irqsave variants
> of any locks in those functions.
> 
> 
To me, this patch looks pretty independent from the series. I.e.,
whatever it is that we will do with locking for keyhandlers, it's ok
(actually, it's better!) to just use spin_lock() in scheduler specific
code.

So:

> Signed-off-by: Juergen Gross <jgross@suse.com>
>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH 3/8] xen/sched: don't use irqsave locks in dumping functions
  2020-02-13 12:54 ` [Xen-devel] [PATCH 3/8] xen/sched: don't use irqsave locks in dumping functions Juergen Gross
  2020-02-19 12:40   ` Dario Faggioli
@ 2020-02-19 14:27   ` Jan Beulich
  2020-02-19 15:02     ` Jürgen Groß
  1 sibling, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2020-02-19 14:27 UTC (permalink / raw)
  To: Juergen Gross; +Cc: George Dunlap, xen-devel, Meng Xu, Dario Faggioli

On 13.02.2020 13:54, Juergen Gross wrote:
> All dumping functions invoked by the "runq" keyhandler are called with
> disabled interrupts,

Is this actually needed for anything? It means not servicing
interrupts for perhaps an extended period of time. Debug keys
aren't promised to be non-intrusive, but they also shouldn't
be more intrusive than really needed. Wouldn't it therefore
be better to keep locking as it is now, and instead make sure
interrupts get turned off elsewhere (if needed) for much
shorter periods of time?

Jan

> so there is no need to use the irqsave variants
> of any locks in those functions.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  xen/common/sched/credit.c  | 10 ++++------
>  xen/common/sched/credit2.c |  5 ++---
>  xen/common/sched/null.c    | 10 ++++------
>  xen/common/sched/rt.c      | 10 ++++------
>  4 files changed, 14 insertions(+), 21 deletions(-)
> 
> diff --git a/xen/common/sched/credit.c b/xen/common/sched/credit.c
> index 05946eea6e..dee87e7fe2 100644
> --- a/xen/common/sched/credit.c
> +++ b/xen/common/sched/credit.c
> @@ -2048,7 +2048,6 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
>      const struct csched_pcpu *spc;
>      const struct csched_unit *svc;
>      spinlock_t *lock;
> -    unsigned long flags;
>      int loop;
>  
>      /*
> @@ -2058,7 +2057,7 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
>       * - we scan through the runqueue, so we need the proper runqueue
>       *   lock (the one of the runqueue of this cpu).
>       */
> -    spin_lock_irqsave(&prv->lock, flags);
> +    spin_lock(&prv->lock);
>      lock = pcpu_schedule_lock(cpu);
>  
>      spc = CSCHED_PCPU(cpu);
> @@ -2089,7 +2088,7 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
>      }
>  
>      pcpu_schedule_unlock(lock, cpu);
> -    spin_unlock_irqrestore(&prv->lock, flags);
> +    spin_unlock(&prv->lock);
>  }
>  
>  static void
> @@ -2098,9 +2097,8 @@ csched_dump(const struct scheduler *ops)
>      struct list_head *iter_sdom, *iter_svc;
>      struct csched_private *prv = CSCHED_PRIV(ops);
>      int loop;
> -    unsigned long flags;
>  
> -    spin_lock_irqsave(&prv->lock, flags);
> +    spin_lock(&prv->lock);
>  
>      printk("info:\n"
>             "\tncpus              = %u\n"
> @@ -2153,7 +2151,7 @@ csched_dump(const struct scheduler *ops)
>          }
>      }
>  
> -    spin_unlock_irqrestore(&prv->lock, flags);
> +    spin_unlock(&prv->lock);
>  }
>  
>  static int __init
> diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
> index f2752f27e2..e76d2ed543 100644
> --- a/xen/common/sched/credit2.c
> +++ b/xen/common/sched/credit2.c
> @@ -3649,14 +3649,13 @@ csched2_dump(const struct scheduler *ops)
>  {
>      struct list_head *iter_sdom;
>      struct csched2_private *prv = csched2_priv(ops);
> -    unsigned long flags;
>      unsigned int i, j, loop;
>  
>      /*
>       * We need the private scheduler lock as we access global
>       * scheduler data and (below) the list of active domains.
>       */
> -    read_lock_irqsave(&prv->lock, flags);
> +    read_lock(&prv->lock);
>  
>      printk("Active queues: %d\n"
>             "\tdefault-weight     = %d\n",
> @@ -3749,7 +3748,7 @@ csched2_dump(const struct scheduler *ops)
>          spin_unlock(&rqd->lock);
>      }
>  
> -    read_unlock_irqrestore(&prv->lock, flags);
> +    read_unlock(&prv->lock);
>  }
>  
>  static void *
> diff --git a/xen/common/sched/null.c b/xen/common/sched/null.c
> index 8c3101649d..3b31703d7e 100644
> --- a/xen/common/sched/null.c
> +++ b/xen/common/sched/null.c
> @@ -954,9 +954,8 @@ static void null_dump_pcpu(const struct scheduler *ops, int cpu)
>      const struct null_pcpu *npc = get_sched_res(cpu)->sched_priv;
>      const struct null_unit *nvc;
>      spinlock_t *lock;
> -    unsigned long flags;
>  
> -    lock = pcpu_schedule_lock_irqsave(cpu, &flags);
> +    lock = pcpu_schedule_lock(cpu);
>  
>      printk("CPU[%02d] sibling={%*pbl}, core={%*pbl}",
>             cpu, CPUMASK_PR(per_cpu(cpu_sibling_mask, cpu)),
> @@ -974,17 +973,16 @@ static void null_dump_pcpu(const struct scheduler *ops, int cpu)
>          printk("\n");
>      }
>  
> -    pcpu_schedule_unlock_irqrestore(lock, flags, cpu);
> +    pcpu_schedule_unlock(lock, cpu);
>  }
>  
>  static void null_dump(const struct scheduler *ops)
>  {
>      struct null_private *prv = null_priv(ops);
>      struct list_head *iter;
> -    unsigned long flags;
>      unsigned int loop;
>  
> -    spin_lock_irqsave(&prv->lock, flags);
> +    spin_lock(&prv->lock);
>  
>      printk("\tcpus_free = %*pbl\n", CPUMASK_PR(&prv->cpus_free));
>  
> @@ -1029,7 +1027,7 @@ static void null_dump(const struct scheduler *ops)
>      printk("\n");
>      spin_unlock(&prv->waitq_lock);
>  
> -    spin_unlock_irqrestore(&prv->lock, flags);
> +    spin_unlock(&prv->lock);
>  }
>  
>  static const struct scheduler sched_null_def = {
> diff --git a/xen/common/sched/rt.c b/xen/common/sched/rt.c
> index 66585ed50a..16379cb2d2 100644
> --- a/xen/common/sched/rt.c
> +++ b/xen/common/sched/rt.c
> @@ -353,9 +353,8 @@ rt_dump_pcpu(const struct scheduler *ops, int cpu)
>  {
>      struct rt_private *prv = rt_priv(ops);
>      const struct rt_unit *svc;
> -    unsigned long flags;
>  
> -    spin_lock_irqsave(&prv->lock, flags);
> +    spin_lock(&prv->lock);
>      printk("CPU[%02d]\n", cpu);
>      /* current UNIT (nothing to say if that's the idle unit). */
>      svc = rt_unit(curr_on_cpu(cpu));
> @@ -363,7 +362,7 @@ rt_dump_pcpu(const struct scheduler *ops, int cpu)
>      {
>          rt_dump_unit(ops, svc);
>      }
> -    spin_unlock_irqrestore(&prv->lock, flags);
> +    spin_unlock(&prv->lock);
>  }
>  
>  static void
> @@ -373,9 +372,8 @@ rt_dump(const struct scheduler *ops)
>      struct rt_private *prv = rt_priv(ops);
>      const struct rt_unit *svc;
>      const struct rt_dom *sdom;
> -    unsigned long flags;
>  
> -    spin_lock_irqsave(&prv->lock, flags);
> +    spin_lock(&prv->lock);
>  
>      if ( list_empty(&prv->sdom) )
>          goto out;
> @@ -421,7 +419,7 @@ rt_dump(const struct scheduler *ops)
>      }
>  
>   out:
> -    spin_unlock_irqrestore(&prv->lock, flags);
> +    spin_unlock(&prv->lock);
>  }
>  
>  /*
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] xen/sched: use keyhandler locks when dumping data to console
  2020-02-13 12:54 ` [Xen-devel] [PATCH 5/8] xen/sched: use keyhandler locks when dumping data to console Juergen Gross
@ 2020-02-19 14:31   ` Dario Faggioli
  2020-02-19 15:09     ` Jürgen Groß
  0 siblings, 1 reply; 26+ messages in thread
From: Dario Faggioli @ 2020-02-19 14:31 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: George Dunlap, Meng Xu


[-- Attachment #1.1: Type: text/plain, Size: 1117 bytes --]

On Thu, 2020-02-13 at 13:54 +0100, Juergen Gross wrote:
> Instead of using the normal locks use the keyhandler provided
> trylocks
> with timeouts. This requires a special primitive for the scheduler
> lock.
> 
So, FWIW, I tend to agree with Andrew on the general aspects of this.
I.e., I personally don't think that the added complexity, however small
one may judge it to be, is worth it... I'm not even sure not using
regular locks is really an improvement.

When you mentioned, in your other mail, that having something like this
would have saved a lot of reboots during the development of core-
scheduling, would _just_ disabling the watchdog have achieved the same?

Anyway, I've had a look at this patch. _If_ we go with this new lock
thing, the modifications to the scheduler code done here seems fine to
me.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH 3/8] xen/sched: don't use irqsave locks in dumping functions
  2020-02-19 14:27   ` Jan Beulich
@ 2020-02-19 15:02     ` Jürgen Groß
  2020-02-19 15:47       ` Dario Faggioli
  0 siblings, 1 reply; 26+ messages in thread
From: Jürgen Groß @ 2020-02-19 15:02 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel, Meng Xu, Dario Faggioli

On 19.02.20 15:27, Jan Beulich wrote:
> On 13.02.2020 13:54, Juergen Gross wrote:
>> All dumping functions invoked by the "runq" keyhandler are called with
>> disabled interrupts,
> 
> Is this actually needed for anything? It means not servicing
> interrupts for perhaps an extended period of time. Debug keys
> aren't promised to be non-intrusive, but they also shouldn't
> be more intrusive than really needed. Wouldn't it therefore
> be better to keep locking as it is now, and instead make sure
> interrupts get turned off elsewhere (if needed) for much
> shorter periods of time?

Indeed this is the better option. I just checked the code and
think blindly turning interrupts off is not needed.

I'll rework the patch and send it out separately.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH 5/8] xen/sched: use keyhandler locks when dumping data to console
  2020-02-19 14:31   ` Dario Faggioli
@ 2020-02-19 15:09     ` Jürgen Groß
  0 siblings, 0 replies; 26+ messages in thread
From: Jürgen Groß @ 2020-02-19 15:09 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Meng Xu

On 19.02.20 15:31, Dario Faggioli wrote:
> On Thu, 2020-02-13 at 13:54 +0100, Juergen Gross wrote:
>> Instead of using the normal locks use the keyhandler provided
>> trylocks
>> with timeouts. This requires a special primitive for the scheduler
>> lock.
>>
> So, FWIW, I tend to agree with Andrew on the general aspects of this.
> I.e., I personally don't think that the added complexity, however small
> one may judge it to be, is worth it... I'm not even sure not using
> regular locks is really an improvement.
> 
> When you mentioned, in your other mail, that having something like this
> would have saved a lot of reboots during the development of core-
> scheduling, would _just_ disabling the watchdog have achieved the same?

No.

I was hit by the keyhandler just waiting for a lock which would never
be freed. This blocked printing information for cpus I would have
liked to see (chances were about 50% in my case that the "interesting"
cpu had a higher cpu number than the locked one).

> 
> Anyway, I've had a look at this patch. _If_ we go with this new lock
> thing, the modifications to the scheduler code done here seems fine to
> me.

Thanks.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH 3/8] xen/sched: don't use irqsave locks in dumping functions
  2020-02-19 15:02     ` Jürgen Groß
@ 2020-02-19 15:47       ` Dario Faggioli
  0 siblings, 0 replies; 26+ messages in thread
From: Dario Faggioli @ 2020-02-19 15:47 UTC (permalink / raw)
  To: Jürgen Groß, Jan Beulich; +Cc: George Dunlap, xen-devel, Meng Xu


[-- Attachment #1.1: Type: text/plain, Size: 1342 bytes --]

On Wed, 2020-02-19 at 16:02 +0100, Jürgen Groß wrote:
> On 19.02.20 15:27, Jan Beulich wrote:
> > On 13.02.2020 13:54, Juergen Gross wrote:
> > > All dumping functions invoked by the "runq" keyhandler are called
> > > with
> > > disabled interrupts,
> > 
> > Is this actually needed for anything? It means not servicing
> > interrupts for perhaps an extended period of time. Debug keys
> > aren't promised to be non-intrusive, but they also shouldn't
> > be more intrusive than really needed. Wouldn't it therefore
> > be better to keep locking as it is now, and instead make sure
> > interrupts get turned off elsewhere (if needed) for much
> > shorter periods of time?
> 
> Indeed this is the better option. I just checked the code and
> think blindly turning interrupts off is not needed.
> 
Well, yes... Assuming you are referring to the IRQ being disabled in
cpupool.c:dump_runq(), my impression is that we can get rid of that,
and leave the sched-specific code (more or less) as it is (for the sake
of runqueue lock irq-safety).

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH 4/8] xen: add locks with timeouts for keyhandlers
  2020-02-13 12:54 ` [Xen-devel] [PATCH 4/8] xen: add locks with timeouts for keyhandlers Juergen Gross
@ 2020-03-05 15:25   ` Jan Beulich
  2020-03-06  8:08     ` Jürgen Groß
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2020-03-05 15:25 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, xen-devel,
	Roger Pau Monné

On 13.02.2020 13:54, Juergen Gross wrote:
> Most keyhandlers are used to dump hypervisor data to the console and
> they are used mostly for debugging purposes. In those cases it might
> happen that some data structures are locked and thus are blocking the
> handler to access the data.
> 
> In order to be able to still get some information don't use plain
> locking functions in the keyhandlers, but a variant of trylocks with
> a timeout value. This allows to wait for some time and to give up in
> case the lock was not obtained.
> 
> Add the main infrastructure for this feature including a new runtime
> parameter allowing to specify the timeout value in milliseconds.
> 
> Use the new locking scheme in the handlers defined in keyhandler.c.

Personally I think trylock (as already used in some places) is the
way to go. Iirc others disagreed, but also didn't like the approach
taken here. I'm not intending to stand in the way if a majority
approves of this model, but I'm not going to ack these changes
myself.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH 4/8] xen: add locks with timeouts for keyhandlers
  2020-03-05 15:25   ` Jan Beulich
@ 2020-03-06  8:08     ` Jürgen Groß
  2020-03-06  8:15       ` Jürgen Groß
  0 siblings, 1 reply; 26+ messages in thread
From: Jürgen Groß @ 2020-03-06  8:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, xen-devel,
	Roger Pau Monné

On 05.03.20 16:25, Jan Beulich wrote:
> On 13.02.2020 13:54, Juergen Gross wrote:
>> Most keyhandlers are used to dump hypervisor data to the console and
>> they are used mostly for debugging purposes. In those cases it might
>> happen that some data structures are locked and thus are blocking the
>> handler to access the data.
>>
>> In order to be able to still get some information don't use plain
>> locking functions in the keyhandlers, but a variant of trylocks with
>> a timeout value. This allows to wait for some time and to give up in
>> case the lock was not obtained.
>>
>> Add the main infrastructure for this feature including a new runtime
>> parameter allowing to specify the timeout value in milliseconds.
>>
>> Use the new locking scheme in the handlers defined in keyhandler.c.
> 
> Personally I think trylock (as already used in some places) is the
> way to go. Iirc others disagreed, but also didn't like the approach
> taken here. I'm not intending to stand in the way if a majority
> approves of this model, but I'm not going to ack these changes
> myself.

Fair enough.

BTW, trylock is used at exactly three places: for dumping vNUMA, MSI and
livepatch info.

And TBH: the vNUMA case is really strange, as this is a rwlock which is
held as writer only in one place for a very brief time period when
freeing the domain's vnuma data.

The MSI case is more complicated and looking at it in more detail I've
realized that there is another trylock hidden in a subfunction:
vpci_msix_arch_print(). As vpci_msix_arch_print() will drop the lock in
the error case the interface to this function is rather weird. In
addition the occasional softirq processing is erro prone, too, as it
will happen only if a single domain has at least 64 MSI entries. In case
of lots of domains with up to 63 entries watchdog timeouts can still
happen, so I'll send a patch repairing this issue by letting
vpci_msix_arch_print() dump only one entry and putting the loop and
softirq handling into vpci_dump_msi().


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Xen-devel] [PATCH 4/8] xen: add locks with timeouts for keyhandlers
  2020-03-06  8:08     ` Jürgen Groß
@ 2020-03-06  8:15       ` Jürgen Groß
  0 siblings, 0 replies; 26+ messages in thread
From: Jürgen Groß @ 2020-03-06  8:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, xen-devel,
	Roger Pau Monné

On 06.03.20 09:08, Jürgen Groß wrote:
> On 05.03.20 16:25, Jan Beulich wrote:
>> On 13.02.2020 13:54, Juergen Gross wrote:
>>> Most keyhandlers are used to dump hypervisor data to the console and
>>> they are used mostly for debugging purposes. In those cases it might
>>> happen that some data structures are locked and thus are blocking the
>>> handler to access the data.
>>>
>>> In order to be able to still get some information don't use plain
>>> locking functions in the keyhandlers, but a variant of trylocks with
>>> a timeout value. This allows to wait for some time and to give up in
>>> case the lock was not obtained.
>>>
>>> Add the main infrastructure for this feature including a new runtime
>>> parameter allowing to specify the timeout value in milliseconds.
>>>
>>> Use the new locking scheme in the handlers defined in keyhandler.c.
>>
>> Personally I think trylock (as already used in some places) is the
>> way to go. Iirc others disagreed, but also didn't like the approach
>> taken here. I'm not intending to stand in the way if a majority
>> approves of this model, but I'm not going to ack these changes
>> myself.
> 
> Fair enough.
> 
> BTW, trylock is used at exactly three places: for dumping vNUMA, MSI and
> livepatch info.
> 
> And TBH: the vNUMA case is really strange, as this is a rwlock which is
> held as writer only in one place for a very brief time period when
> freeing the domain's vnuma data.
> 
> The MSI case is more complicated and looking at it in more detail I've
> realized that there is another trylock hidden in a subfunction:
> vpci_msix_arch_print(). As vpci_msix_arch_print() will drop the lock in
> the error case the interface to this function is rather weird. In
> addition the occasional softirq processing is erro prone, too, as it
> will happen only if a single domain has at least 64 MSI entries. In case
> of lots of domains with up to 63 entries watchdog timeouts can still
> happen, so I'll send a patch repairing this issue by letting
> vpci_msix_arch_print() dump only one entry and putting the loop and
> softirq handling into vpci_dump_msi().

Oh, sorry, softirqs are processed often enough, I missed one call.

Nevertheless the locking interface wants to be corrected IMO.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-03-06  8:15 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-13 12:54 [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks Juergen Gross
2020-02-13 12:54 ` [Xen-devel] [PATCH 1/8] xen: make rangeset_printk() static Juergen Gross
2020-02-13 14:00   ` Jan Beulich
2020-02-13 12:54 ` [Xen-devel] [PATCH 2/8] xen: add using domlist_read_lock in keyhandlers Juergen Gross
2020-02-13 14:01   ` Jan Beulich
2020-02-13 14:09   ` George Dunlap
2020-02-18  5:42   ` Tian, Kevin
2020-02-13 12:54 ` [Xen-devel] [PATCH 3/8] xen/sched: don't use irqsave locks in dumping functions Juergen Gross
2020-02-19 12:40   ` Dario Faggioli
2020-02-19 14:27   ` Jan Beulich
2020-02-19 15:02     ` Jürgen Groß
2020-02-19 15:47       ` Dario Faggioli
2020-02-13 12:54 ` [Xen-devel] [PATCH 4/8] xen: add locks with timeouts for keyhandlers Juergen Gross
2020-03-05 15:25   ` Jan Beulich
2020-03-06  8:08     ` Jürgen Groß
2020-03-06  8:15       ` Jürgen Groß
2020-02-13 12:54 ` [Xen-devel] [PATCH 5/8] xen/sched: use keyhandler locks when dumping data to console Juergen Gross
2020-02-19 14:31   ` Dario Faggioli
2020-02-19 15:09     ` Jürgen Groß
2020-02-13 12:54 ` [Xen-devel] [PATCH 6/8] xen/common: " Juergen Gross
2020-02-13 12:54 ` [Xen-devel] [PATCH 7/8] xen/drivers: " Juergen Gross
2020-02-13 12:54 ` [Xen-devel] [PATCH 8/8] xen/x86: " Juergen Gross
2020-02-13 18:38 ` [Xen-devel] [PATCH 0/8] xen: don't let keyhandlers block indefinitely on locks Andrew Cooper
2020-02-14  6:05   ` Jürgen Groß
2020-02-14  9:37   ` Jan Beulich
2020-02-19 12:14     ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.