xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-4.15 0/2] x86/hpet: Try to unbreak Ryzen 1800X systems
@ 2021-03-25 16:52 Andrew Cooper
  2021-03-25 16:52 ` [PATCH 1/2] x86/hpet: Factor hpet_enable_legacy_replacement_mode() out of hpet_setup() Andrew Cooper
  2021-03-25 16:52 ` [PATCH 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally Andrew Cooper
  0 siblings, 2 replies; 27+ messages in thread
From: Andrew Cooper @ 2021-03-25 16:52 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Ian Jackson, Marek Marczykowski-Górecki,
	Frédéric Pierret

This is a refinement of Jan's "[PATCH][4.15] x86/HPET: don't enable legacy
replacement mode unconditionally" to try and make Xen do the helpful thing on
boot, rather than requiring a non-default command line option to boot in the
first place.

Andrew Cooper (1):
  x86/hpet: Factor hpet_enable_legacy_replacement_mode() out of hpet_setup()

Jan Beulich (1):
  x86/hpet: Don't enable legacy replacement mode unconditionally

 docs/misc/xen-command-line.pandoc |  33 ++++++++
 xen/arch/x86/hpet.c               | 157 ++++++++++++++++++++++----------------
 xen/arch/x86/io_apic.c            |  26 +++++++
 xen/include/asm-x86/hpet.h        |   7 ++
 4 files changed, 157 insertions(+), 66 deletions(-)

-- 
2.11.0



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/2] x86/hpet: Factor hpet_enable_legacy_replacement_mode() out of hpet_setup()
  2021-03-25 16:52 [PATCH for-4.15 0/2] x86/hpet: Try to unbreak Ryzen 1800X systems Andrew Cooper
@ 2021-03-25 16:52 ` Andrew Cooper
  2021-03-26  9:59   ` Jan Beulich
  2021-03-25 16:52 ` [PATCH 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally Andrew Cooper
  1 sibling, 1 reply; 27+ messages in thread
From: Andrew Cooper @ 2021-03-25 16:52 UTC (permalink / raw)
  To: Xen-devel
  Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Ian Jackson, Marek Marczykowski-Górecki,
	Frédéric Pierret

... in preparation to introduce a second caller.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Ian Jackson <iwj@xenproject.org>
CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
CC: Frédéric Pierret <frederic.pierret@qubes-os.org>

For 4.15.  Pre-req for trying to unbreak AMD Ryzen 1800X systems.
---
 xen/arch/x86/hpet.c        | 116 ++++++++++++++++++++++++---------------------
 xen/include/asm-x86/hpet.h |   6 +++
 2 files changed, 68 insertions(+), 54 deletions(-)

diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index 1ff005fb4a..c73135bb15 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -754,11 +754,70 @@ int hpet_legacy_irq_tick(void)
 }
 
 static u32 *hpet_boot_cfg;
+static u64 __initdata hpet_rate;
+
+bool __init hpet_enable_legacy_replacement_mode(void)
+{
+    unsigned int id, cfg, c0_cfg, ticks, count;
+
+    if ( !hpet_rate ||
+         !((id = hpet_read32(HPET_ID)) & HPET_ID_LEGSUP) ||
+         (cfg = hpet_read32(HPET_CFG) & HPET_CFG_LEGACY) )
+        return false;
+
+    /* Stop the main counter. */
+    hpet_write32(cfg & ~HPET_CFG_ENABLE, HPET_CFG);
+
+    /* Reconfigure channel 0 to be 32bit periodic. */
+    c0_cfg = hpet_read32(HPET_Tn_CFG(0));
+    c0_cfg |= (HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
+               HPET_TN_32BIT);
+    hpet_write32(c0_cfg, HPET_Tn_CFG(0));
+
+    /*
+     * The exact period doesn't have to match a legacy PIT.  All we need
+     * is an interrupt queued up via the IO-APIC to check routing.
+     *
+     * Use HZ as the frequency.
+     */
+    ticks = ((SECONDS(1) / HZ) * div_sc(hpet_rate, SECONDS(1), 32)) >> 32;
+
+    count = hpet_read32(HPET_COUNTER);
+
+    /*
+     * HPET_TN_SETVAL above is atrociously documented in the spec.
+     *
+     * Periodic HPET channels have a main comparator register, and
+     * separate "accumulator" register.  Despite being named accumulator
+     * in the spec, this is not an accurate description of its behaviour
+     * or purpose.
+     *
+     * Each time an interrupt is generated, the "accumulator" register is
+     * re-added to the comparator set up the new period.
+     *
+     * Normally, writes to the CMP register update both registers.
+     * However, under these semantics, it is impossible to set up a
+     * periodic timer correctly without the main HPET counter being at 0.
+     *
+     * Instead, HPET_TN_SETVAL is a self-clearing control bit which we can
+     * use for periodic timers to mean that the second write to CMP
+     * updates the accumulator only, and not the absolute comparator
+     * value.
+     *
+     * This lets us set a period when the main counter isn't at 0.
+     */
+    hpet_write32(count + ticks, HPET_Tn_CMP(0));
+    hpet_write32(ticks,         HPET_Tn_CMP(0));
+
+    /* Restart the main counter, and legacy mode. */
+    hpet_write32(cfg | HPET_CFG_ENABLE | HPET_CFG_LEGACY, HPET_CFG);
+
+    return true;
+}
 
 u64 __init hpet_setup(void)
 {
-    static u64 __initdata hpet_rate;
-    unsigned int hpet_id, hpet_period, hpet_cfg;
+    unsigned int hpet_id, hpet_period;
     unsigned int last, rem;
 
     if ( hpet_rate )
@@ -805,58 +864,7 @@ u64 __init hpet_setup(void)
      * Reconfigure the HPET into legacy mode to re-establish the timer
      * interrupt.
      */
-    if ( hpet_id & HPET_ID_LEGSUP &&
-         !((hpet_cfg = hpet_read32(HPET_CFG)) & HPET_CFG_LEGACY) )
-    {
-        unsigned int c0_cfg, ticks, count;
-
-        /* Stop the main counter. */
-        hpet_write32(hpet_cfg & ~HPET_CFG_ENABLE, HPET_CFG);
-
-        /* Reconfigure channel 0 to be 32bit periodic. */
-        c0_cfg = hpet_read32(HPET_Tn_CFG(0));
-        c0_cfg |= (HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
-                   HPET_TN_32BIT);
-        hpet_write32(c0_cfg, HPET_Tn_CFG(0));
-
-        /*
-         * The exact period doesn't have to match a legacy PIT.  All we need
-         * is an interrupt queued up via the IO-APIC to check routing.
-         *
-         * Use HZ as the frequency.
-         */
-        ticks = ((SECONDS(1) / HZ) * div_sc(hpet_rate, SECONDS(1), 32)) >> 32;
-
-        count = hpet_read32(HPET_COUNTER);
-
-        /*
-         * HPET_TN_SETVAL above is atrociously documented in the spec.
-         *
-         * Periodic HPET channels have a main comparator register, and
-         * separate "accumulator" register.  Despite being named accumulator
-         * in the spec, this is not an accurate description of its behaviour
-         * or purpose.
-         *
-         * Each time an interrupt is generated, the "accumulator" register is
-         * re-added to the comparator set up the new period.
-         *
-         * Normally, writes to the CMP register update both registers.
-         * However, under these semantics, it is impossible to set up a
-         * periodic timer correctly without the main HPET counter being at 0.
-         *
-         * Instead, HPET_TN_SETVAL is a self-clearing control bit which we can
-         * use for periodic timers to mean that the second write to CMP
-         * updates the accumulator only, and not the absolute comparator
-         * value.
-         *
-         * This lets us set a period when the main counter isn't at 0.
-         */
-        hpet_write32(count + ticks, HPET_Tn_CMP(0));
-        hpet_write32(ticks,         HPET_Tn_CMP(0));
-
-        /* Restart the main counter, and legacy mode. */
-        hpet_write32(hpet_cfg | HPET_CFG_ENABLE | HPET_CFG_LEGACY, HPET_CFG);
-    }
+    hpet_enable_legacy_replacement_mode();
 
     return hpet_rate;
 }
diff --git a/xen/include/asm-x86/hpet.h b/xen/include/asm-x86/hpet.h
index fb6bf05065..50176de3d2 100644
--- a/xen/include/asm-x86/hpet.h
+++ b/xen/include/asm-x86/hpet.h
@@ -73,6 +73,12 @@ void hpet_disable(void);
 int hpet_legacy_irq_tick(void);
 
 /*
+ * Try to enable HPET Legacy Replacement mode.  Returns a boolean indicating
+ * whether the HPET configuration was changed.
+ */
+bool hpet_enable_legacy_replacement_mode(void);
+
+/*
  * Temporarily use an HPET event counter for timer interrupt handling,
  * rather than using the LAPIC timer. Used for Cx state entry.
  */
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-25 16:52 [PATCH for-4.15 0/2] x86/hpet: Try to unbreak Ryzen 1800X systems Andrew Cooper
  2021-03-25 16:52 ` [PATCH 1/2] x86/hpet: Factor hpet_enable_legacy_replacement_mode() out of hpet_setup() Andrew Cooper
@ 2021-03-25 16:52 ` Andrew Cooper
  2021-03-25 17:00   ` Andrew Cooper
  2021-03-25 17:21   ` [PATCH v1.1 " Andrew Cooper
  1 sibling, 2 replies; 27+ messages in thread
From: Andrew Cooper @ 2021-03-25 16:52 UTC (permalink / raw)
  To: Xen-devel
  Cc: Jan Beulich, Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Ian Jackson, Marek Marczykowski-Górecki,
	Frédéric Pierret

From: Jan Beulich <jbeulich@suse.com>

Commit e1de4c196a2e ("x86/timer: Fix boot on Intel systems using ITSSPRC
static PIT clock gating") was reported to cause boot failures on certain
AMD Ryzen systems.

Refine the fix to do nothing in the default case, and only attempt to
configure legacy replacement mode if IRQ0 is found to not be working.

In addition, introduce a "hpet" command line option so this heuristic
can be overridden.  Since it makes little sense to introduce just
"hpet=legacy-replacement", also allow for a boolean argument as well as
"broadcast" to replace the separate "hpetbroadcast" option.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Ian Jackson <iwj@xenproject.org>
CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
CC: Frédéric Pierret <frederic.pierret@qubes-os.org>

For 4.15: Attempt to unbreak AMD Ryzen 1800X systems.
---
 docs/misc/xen-command-line.pandoc | 33 ++++++++++++++++++++++++++++++
 xen/arch/x86/hpet.c               | 43 +++++++++++++++++++++++++++------------
 xen/arch/x86/io_apic.c            | 26 +++++++++++++++++++++++
 xen/include/asm-x86/hpet.h        |  1 +
 4 files changed, 90 insertions(+), 13 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index a0601ff838..4d020d4ad7 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1274,9 +1274,42 @@ supported. See docs/misc/arm/big.LITTLE.txt for more information.
 When the hmp-unsafe option is disabled (default), CPUs that are not
 identical to the boot CPU will be parked and not used by Xen.
 
+### hpet (x86)
+    = List of [ <bool> | broadcast=<bool> | legacy-replacement=<bool> ]
+
+    Applicability: x86
+
+Controls Xen's use of the system's High Precision Event Timer.  By default,
+Xen will use an HPET when available and not subject to errata.  Use of the
+HPET can be disabled by specifying `hpet=0`.
+
+ * The `broadcast` boolean is disabled by default, but forces Xen to keep
+   using the broadcast for CPUs in deep C-states even when an RTC interrupt is
+   enabled.  This then also affects raising of the RTC interrupt.
+
+ * The `legacy-replacement` boolean allows for control over whether Legacy
+   Replacement mode is enabled.
+
+   Legacy Replacement mode is intended for hardware which does not have an
+   8025 PIT, and allows the HPET to be configured into a compatible mode.
+   Intel chipsets from Skylake/ApolloLake onwards can turn the PIT off for
+   power saving reasons, and there is no platform-agnostic mechanism for
+   discovering this.
+
+   By default, Xen will not change hardware configuration, unless the PIT
+   appears to be absent, at which point Xen will try to enable Legacy
+   Replacement mode before falling back to pre-IO-APIC interrupt routing
+   options.
+
+   This behaviour can be inhibited by specifying `legacy-replacement=0`.
+   Alternatively, this mode can be enabled unconditionally (if available) by
+   specifying `legacy-replacement=1`.
+
 ### hpetbroadcast (x86)
 > `= <boolean>`
 
+Deprecated alternative of `hpet=broadcast`.
+
 ### hvm_debug (x86)
 > `= <integer>`
 
diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index c73135bb15..270fef38c3 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -52,6 +52,8 @@ static unsigned int __read_mostly num_hpets_used;
 DEFINE_PER_CPU(struct hpet_event_channel *, cpu_bc_channel);
 
 unsigned long __initdata hpet_address;
+int8_t __initdata opt_hpet_legacy_replacement = -1;
+static bool __initdata opt_hpet = true;
 u8 __initdata hpet_blockid;
 u8 __initdata hpet_flags;
 
@@ -63,6 +65,32 @@ u8 __initdata hpet_flags;
 static bool __initdata force_hpet_broadcast;
 boolean_param("hpetbroadcast", force_hpet_broadcast);
 
+static int __init parse_hpet_param(const char *s)
+{
+    const char *ss;
+    int val, rc = 0;
+
+    do {
+        ss = strchr(s, ',');
+        if ( !ss )
+            ss = strchr(s, '\0');
+
+        if ( (val = parse_bool(s, ss)) >= 0 )
+            opt_hpet = val;
+        else if ( (val = parse_boolean("broadcast", s, ss)) >= 0 )
+            force_hpet_broadcast = val;
+        else if ( (val = parse_boolean("legacy-replacement", s, ss)) >= 0 )
+            opt_hpet_legacy_replacement = val;
+        else
+            rc = -EINVAL;
+
+        s = ss + 1;
+    } while ( *ss );
+
+    return rc;
+}
+custom_param("hpet", parse_hpet_param);
+
 /*
  * Calculate a multiplication factor for scaled math, which is used to convert
  * nanoseconds based values to clock ticks:
@@ -852,19 +880,8 @@ u64 __init hpet_setup(void)
     if ( (rem * 2) > hpet_period )
         hpet_rate++;
 
-    /*
-     * Intel chipsets from Skylake/ApolloLake onwards can statically clock
-     * gate the 8259 PIT.  This option is enabled by default in slightly later
-     * systems, as turning the PIT off is a prerequisite to entering the C11
-     * power saving state.
-     *
-     * Xen currently depends on the legacy timer interrupt being active while
-     * IRQ routing is configured.
-     *
-     * Reconfigure the HPET into legacy mode to re-establish the timer
-     * interrupt.
-     */
-    hpet_enable_legacy_replacement_mode();
+    if ( opt_hpet_legacy_replacement > 0 )
+        hpet_enable_legacy_replacement_mode();
 
     return hpet_rate;
 }
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index e93265f379..f08c60d71f 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -29,6 +29,8 @@
 #include <xen/acpi.h>
 #include <xen/keyhandler.h>
 #include <xen/softirq.h>
+
+#include <asm/hpet.h>
 #include <asm/mc146818rtc.h>
 #include <asm/smp.h>
 #include <asm/desc.h>
@@ -1922,14 +1924,38 @@ static void __init check_timer(void)
            vector, apic1, pin1, apic2, pin2);
 
     if (pin1 != -1) {
+        bool hpet_changed = false;
+
         /*
          * Ok, does IRQ0 through the IOAPIC work?
          */
         unmask_IO_APIC_irq(irq_to_desc(0));
+    retry_ioapic_pin:
         if (timer_irq_works()) {
             local_irq_restore(flags);
             return;
         }
+
+        /*
+         * Intel chipsets from Skylake/ApolloLake onwards can statically clock
+         * gate the 8259 PIT.  This option is enabled by default in slightly
+         * later systems, as turning the PIT off is a prerequisite to entering
+         * the C11 power saving state.
+         *
+         * Xen currently depends on the legacy timer interrupt being active
+         * while IRQ routing is configured.
+         *
+         * If the user hasn't made an explicit option, attempt to reconfigure
+         * the HPET into legacy mode to re-establish the timer interrupt.
+         */
+        if ( opt_hpet_legacy_replacement < 0 &&
+             !hpet_changed && hpet_enable_legacy_replacement_mode() )
+        {
+            printk(XENLOG_ERR "..no 8254 timer found - trying HPET Legacy Replacement Mode\n");
+            hpet_changed = true;
+            goto retry_ioapic_pin;
+        }
+
         clear_IO_APIC_pin(apic1, pin1);
         printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to "
                "IO-APIC\n");
diff --git a/xen/include/asm-x86/hpet.h b/xen/include/asm-x86/hpet.h
index 50176de3d2..07bc8d6079 100644
--- a/xen/include/asm-x86/hpet.h
+++ b/xen/include/asm-x86/hpet.h
@@ -53,6 +53,7 @@
 extern unsigned long hpet_address;
 extern u8 hpet_blockid;
 extern u8 hpet_flags;
+extern int8_t opt_hpet_legacy_replacement;
 
 /*
  * Detect and initialise HPET hardware: return counter update frequency.
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-25 16:52 ` [PATCH 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally Andrew Cooper
@ 2021-03-25 17:00   ` Andrew Cooper
  2021-03-25 17:21   ` [PATCH v1.1 " Andrew Cooper
  1 sibling, 0 replies; 27+ messages in thread
From: Andrew Cooper @ 2021-03-25 17:00 UTC (permalink / raw)
  To: Xen-devel
  Cc: Jan Beulich, Roger Pau Monné,
	Wei Liu, Ian Jackson, Marek Marczykowski-Górecki,
	Frédéric Pierret

On 25/03/2021 16:52, Andrew Cooper wrote:
> From: Jan Beulich <jbeulich@suse.com>
>
> Commit e1de4c196a2e ("x86/timer: Fix boot on Intel systems using ITSSPRC
> static PIT clock gating") was reported to cause boot failures on certain
> AMD Ryzen systems.
>
> Refine the fix to do nothing in the default case, and only attempt to
> configure legacy replacement mode if IRQ0 is found to not be working.
>
> In addition, introduce a "hpet" command line option so this heuristic
> can be overridden.  Since it makes little sense to introduce just
> "hpet=legacy-replacement", also allow for a boolean argument as well as
> "broadcast" to replace the separate "hpetbroadcast" option.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
> CC: Ian Jackson <iwj@xenproject.org>
> CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> CC: Frédéric Pierret <frederic.pierret@qubes-os.org>
>
> For 4.15: Attempt to unbreak AMD Ryzen 1800X systems.

Sorry - lost a hunk during a rebase (the one to cope with hpet=0).  I'll
fold that in and post a v1.1 in due course.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-25 16:52 ` [PATCH 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally Andrew Cooper
  2021-03-25 17:00   ` Andrew Cooper
@ 2021-03-25 17:21   ` Andrew Cooper
  2021-03-26  9:51     ` Jan Beulich
                       ` (2 more replies)
  1 sibling, 3 replies; 27+ messages in thread
From: Andrew Cooper @ 2021-03-25 17:21 UTC (permalink / raw)
  To: Xen-devel
  Cc: Jan Beulich, Andrew Cooper, Jan Beulich, Roger Pau Monné,
	Wei Liu, Ian Jackson, Marek Marczykowski-Górecki,
	Frédéric Pierret

From: Jan Beulich <jbeulich@suse.com>

Commit e1de4c196a2e ("x86/timer: Fix boot on Intel systems using ITSSPRC
static PIT clock gating") was reported to cause boot failures on certain
AMD Ryzen systems.

Refine the fix to do nothing in the default case, and only attempt to
configure legacy replacement mode if IRQ0 is found to not be working.

In addition, introduce a "hpet" command line option so this heuristic
can be overridden.  Since it makes little sense to introduce just
"hpet=legacy-replacement", also allow for a boolean argument as well as
"broadcast" to replace the separate "hpetbroadcast" option.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Ian Jackson <iwj@xenproject.org>
CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
CC: Frédéric Pierret <frederic.pierret@qubes-os.org>

v2:
 * Drop missing hunk from Jan's original patch.

For 4.15: Attempt to unbreak AMD Ryzen 1800X systems.
---
 docs/misc/xen-command-line.pandoc | 33 +++++++++++++++++++++++++++
 xen/arch/x86/hpet.c               | 48 +++++++++++++++++++++++++--------------
 xen/arch/x86/io_apic.c            | 26 +++++++++++++++++++++
 xen/include/asm-x86/hpet.h        |  1 +
 4 files changed, 91 insertions(+), 17 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index a0601ff838..4d020d4ad7 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1274,9 +1274,42 @@ supported. See docs/misc/arm/big.LITTLE.txt for more information.
 When the hmp-unsafe option is disabled (default), CPUs that are not
 identical to the boot CPU will be parked and not used by Xen.
 
+### hpet (x86)
+    = List of [ <bool> | broadcast=<bool> | legacy-replacement=<bool> ]
+
+    Applicability: x86
+
+Controls Xen's use of the system's High Precision Event Timer.  By default,
+Xen will use an HPET when available and not subject to errata.  Use of the
+HPET can be disabled by specifying `hpet=0`.
+
+ * The `broadcast` boolean is disabled by default, but forces Xen to keep
+   using the broadcast for CPUs in deep C-states even when an RTC interrupt is
+   enabled.  This then also affects raising of the RTC interrupt.
+
+ * The `legacy-replacement` boolean allows for control over whether Legacy
+   Replacement mode is enabled.
+
+   Legacy Replacement mode is intended for hardware which does not have an
+   8025 PIT, and allows the HPET to be configured into a compatible mode.
+   Intel chipsets from Skylake/ApolloLake onwards can turn the PIT off for
+   power saving reasons, and there is no platform-agnostic mechanism for
+   discovering this.
+
+   By default, Xen will not change hardware configuration, unless the PIT
+   appears to be absent, at which point Xen will try to enable Legacy
+   Replacement mode before falling back to pre-IO-APIC interrupt routing
+   options.
+
+   This behaviour can be inhibited by specifying `legacy-replacement=0`.
+   Alternatively, this mode can be enabled unconditionally (if available) by
+   specifying `legacy-replacement=1`.
+
 ### hpetbroadcast (x86)
 > `= <boolean>`
 
+Deprecated alternative of `hpet=broadcast`.
+
 ### hvm_debug (x86)
 > `= <integer>`
 
diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index c73135bb15..957e053a47 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -52,6 +52,8 @@ static unsigned int __read_mostly num_hpets_used;
 DEFINE_PER_CPU(struct hpet_event_channel *, cpu_bc_channel);
 
 unsigned long __initdata hpet_address;
+int8_t __initdata opt_hpet_legacy_replacement = -1;
+static bool __initdata opt_hpet = true;
 u8 __initdata hpet_blockid;
 u8 __initdata hpet_flags;
 
@@ -63,6 +65,32 @@ u8 __initdata hpet_flags;
 static bool __initdata force_hpet_broadcast;
 boolean_param("hpetbroadcast", force_hpet_broadcast);
 
+static int __init parse_hpet_param(const char *s)
+{
+    const char *ss;
+    int val, rc = 0;
+
+    do {
+        ss = strchr(s, ',');
+        if ( !ss )
+            ss = strchr(s, '\0');
+
+        if ( (val = parse_bool(s, ss)) >= 0 )
+            opt_hpet = val;
+        else if ( (val = parse_boolean("broadcast", s, ss)) >= 0 )
+            force_hpet_broadcast = val;
+        else if ( (val = parse_boolean("legacy-replacement", s, ss)) >= 0 )
+            opt_hpet_legacy_replacement = val;
+        else
+            rc = -EINVAL;
+
+        s = ss + 1;
+    } while ( *ss );
+
+    return rc;
+}
+custom_param("hpet", parse_hpet_param);
+
 /*
  * Calculate a multiplication factor for scaled math, which is used to convert
  * nanoseconds based values to clock ticks:
@@ -820,12 +848,9 @@ u64 __init hpet_setup(void)
     unsigned int hpet_id, hpet_period;
     unsigned int last, rem;
 
-    if ( hpet_rate )
+    if ( hpet_rate || !hpet_address || !opt_hpet )
         return hpet_rate;
 
-    if ( hpet_address == 0 )
-        return 0;
-
     set_fixmap_nocache(FIX_HPET_BASE, hpet_address);
 
     hpet_id = hpet_read32(HPET_ID);
@@ -852,19 +877,8 @@ u64 __init hpet_setup(void)
     if ( (rem * 2) > hpet_period )
         hpet_rate++;
 
-    /*
-     * Intel chipsets from Skylake/ApolloLake onwards can statically clock
-     * gate the 8259 PIT.  This option is enabled by default in slightly later
-     * systems, as turning the PIT off is a prerequisite to entering the C11
-     * power saving state.
-     *
-     * Xen currently depends on the legacy timer interrupt being active while
-     * IRQ routing is configured.
-     *
-     * Reconfigure the HPET into legacy mode to re-establish the timer
-     * interrupt.
-     */
-    hpet_enable_legacy_replacement_mode();
+    if ( opt_hpet_legacy_replacement > 0 )
+        hpet_enable_legacy_replacement_mode();
 
     return hpet_rate;
 }
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index e93265f379..f08c60d71f 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -29,6 +29,8 @@
 #include <xen/acpi.h>
 #include <xen/keyhandler.h>
 #include <xen/softirq.h>
+
+#include <asm/hpet.h>
 #include <asm/mc146818rtc.h>
 #include <asm/smp.h>
 #include <asm/desc.h>
@@ -1922,14 +1924,38 @@ static void __init check_timer(void)
            vector, apic1, pin1, apic2, pin2);
 
     if (pin1 != -1) {
+        bool hpet_changed = false;
+
         /*
          * Ok, does IRQ0 through the IOAPIC work?
          */
         unmask_IO_APIC_irq(irq_to_desc(0));
+    retry_ioapic_pin:
         if (timer_irq_works()) {
             local_irq_restore(flags);
             return;
         }
+
+        /*
+         * Intel chipsets from Skylake/ApolloLake onwards can statically clock
+         * gate the 8259 PIT.  This option is enabled by default in slightly
+         * later systems, as turning the PIT off is a prerequisite to entering
+         * the C11 power saving state.
+         *
+         * Xen currently depends on the legacy timer interrupt being active
+         * while IRQ routing is configured.
+         *
+         * If the user hasn't made an explicit option, attempt to reconfigure
+         * the HPET into legacy mode to re-establish the timer interrupt.
+         */
+        if ( opt_hpet_legacy_replacement < 0 &&
+             !hpet_changed && hpet_enable_legacy_replacement_mode() )
+        {
+            printk(XENLOG_ERR "..no 8254 timer found - trying HPET Legacy Replacement Mode\n");
+            hpet_changed = true;
+            goto retry_ioapic_pin;
+        }
+
         clear_IO_APIC_pin(apic1, pin1);
         printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to "
                "IO-APIC\n");
diff --git a/xen/include/asm-x86/hpet.h b/xen/include/asm-x86/hpet.h
index 50176de3d2..07bc8d6079 100644
--- a/xen/include/asm-x86/hpet.h
+++ b/xen/include/asm-x86/hpet.h
@@ -53,6 +53,7 @@
 extern unsigned long hpet_address;
 extern u8 hpet_blockid;
 extern u8 hpet_flags;
+extern int8_t opt_hpet_legacy_replacement;
 
 /*
  * Detect and initialise HPET hardware: return counter update frequency.
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-25 17:21   ` [PATCH v1.1 " Andrew Cooper
@ 2021-03-26  9:51     ` Jan Beulich
  2021-03-26 16:32       ` Andrew Cooper
  2021-03-26 12:03     ` Ian Jackson
  2021-03-26 13:50     ` Frédéric Pierret
  2 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2021-03-26  9:51 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Ian Jackson, Marek Marczykowski-Górecki,
	Frédéric Pierret, Xen-devel

On 25.03.2021 18:21, Andrew Cooper wrote:
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1274,9 +1274,42 @@ supported. See docs/misc/arm/big.LITTLE.txt for more information.
>  When the hmp-unsafe option is disabled (default), CPUs that are not
>  identical to the boot CPU will be parked and not used by Xen.
>  
> +### hpet (x86)
> +    = List of [ <bool> | broadcast=<bool> | legacy-replacement=<bool> ]
> +
> +    Applicability: x86

If this is the more modern form to express this information, then the
(x86) I did put on the sub-title line should imo be dropped.

> +Controls Xen's use of the system's High Precision Event Timer.  By default,
> +Xen will use an HPET when available and not subject to errata.  Use of the
> +HPET can be disabled by specifying `hpet=0`.
> +
> + * The `broadcast` boolean is disabled by default, but forces Xen to keep
> +   using the broadcast for CPUs in deep C-states even when an RTC interrupt is
> +   enabled.  This then also affects raising of the RTC interrupt.
> +
> + * The `legacy-replacement` boolean allows for control over whether Legacy
> +   Replacement mode is enabled.
> +
> +   Legacy Replacement mode is intended for hardware which does not have an
> +   8025 PIT, and allows the HPET to be configured into a compatible mode.

8254 ?

> @@ -1922,14 +1924,38 @@ static void __init check_timer(void)
>             vector, apic1, pin1, apic2, pin2);
>  
>      if (pin1 != -1) {
> +        bool hpet_changed = false;
> +
>          /*
>           * Ok, does IRQ0 through the IOAPIC work?
>           */
>          unmask_IO_APIC_irq(irq_to_desc(0));
> +    retry_ioapic_pin:
>          if (timer_irq_works()) {
>              local_irq_restore(flags);
>              return;
>          }
> +
> +        /*
> +         * Intel chipsets from Skylake/ApolloLake onwards can statically clock
> +         * gate the 8259 PIT.  This option is enabled by default in slightly

8254?

> +         * later systems, as turning the PIT off is a prerequisite to entering
> +         * the C11 power saving state.
> +         *
> +         * Xen currently depends on the legacy timer interrupt being active
> +         * while IRQ routing is configured.
> +         *
> +         * If the user hasn't made an explicit option, attempt to reconfigure

s/option/choice/ or s/made/given/?

> +         * the HPET into legacy mode to re-establish the timer interrupt.
> +         */
> +        if ( opt_hpet_legacy_replacement < 0 &&
> +             !hpet_changed && hpet_enable_legacy_replacement_mode() )
> +        {
> +            printk(XENLOG_ERR "..no 8254 timer found - trying HPET Legacy Replacement Mode\n");
> +            hpet_changed = true;
> +            goto retry_ioapic_pin;
> +        }
> +
>          clear_IO_APIC_pin(apic1, pin1);
>          printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to "
>                 "IO-APIC\n");

As mentioned on irc already, I'm somewhat concerned by doing this change
first (and also not undoing it if it didn't work). An AMD Turion based
laptop I was using many years ago required one of the other fallbacks to
be engaged, and hence I'd expect certain other (old?) systems to be
similarly affected. Sadly (for the purposes here) I don't have this
laptop anymore, so wouldn't be able to verify whether the above actually
breaks there.

As a minor remark, I find the "goto" based approach not very nice (we've
been generally saying we consider "goto" okay largely for simplification
of error handling, to avoid having many [partly] redundant function exit
paths), but I can see how using for() or while() or do/while() would
make the code larger and require deeper indentation.

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] x86/hpet: Factor hpet_enable_legacy_replacement_mode() out of hpet_setup()
  2021-03-25 16:52 ` [PATCH 1/2] x86/hpet: Factor hpet_enable_legacy_replacement_mode() out of hpet_setup() Andrew Cooper
@ 2021-03-26  9:59   ` Jan Beulich
  2021-03-26 10:53     ` Andrew Cooper
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2021-03-26  9:59 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Ian Jackson, Marek Marczykowski-Górecki,
	Frédéric Pierret, Xen-devel

On 25.03.2021 17:52, Andrew Cooper wrote:
> ... in preparation to introduce a second caller.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Generally
Reviewed-by: Jan Beulich <jbeulich@suse.com>
but I think there's one small code change needed plus I have two
nits (and I expect that this change wouldn't be committed without
patch 2, as making the function non-static isn't warranted
otherwise):

> --- a/xen/arch/x86/hpet.c
> +++ b/xen/arch/x86/hpet.c
> @@ -754,11 +754,70 @@ int hpet_legacy_irq_tick(void)
>  }
>  
>  static u32 *hpet_boot_cfg;
> +static u64 __initdata hpet_rate;

Use uint64_t as you move this here?

> +bool __init hpet_enable_legacy_replacement_mode(void)
> +{
> +    unsigned int id, cfg, c0_cfg, ticks, count;
> +
> +    if ( !hpet_rate ||

I think you need to also honor opt_hpet here.

> +         !((id = hpet_read32(HPET_ID)) & HPET_ID_LEGSUP) ||

I don't think I see a need for the assignment and hence the local
variable. Dropping it would make the code easier to read imo.

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] x86/hpet: Factor hpet_enable_legacy_replacement_mode() out of hpet_setup()
  2021-03-26  9:59   ` Jan Beulich
@ 2021-03-26 10:53     ` Andrew Cooper
  2021-03-26 10:55       ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Cooper @ 2021-03-26 10:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	Wei Liu, Ian Jackson, Marek Marczykowski-Górecki,
	Frédéric Pierret, Xen-devel

On 26/03/2021 09:59, Jan Beulich wrote:
> On 25.03.2021 17:52, Andrew Cooper wrote:
>> ... in preparation to introduce a second caller.
>>
>> No functional change.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Generally
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks,

> but I think there's one small code change needed plus I have two
> nits (and I expect that this change wouldn't be committed without
> patch 2, as making the function non-static isn't warranted
> otherwise):

Yeah - I intend these to go in together.

>
>> --- a/xen/arch/x86/hpet.c
>> +++ b/xen/arch/x86/hpet.c
>> @@ -754,11 +754,70 @@ int hpet_legacy_irq_tick(void)
>>  }
>>  
>>  static u32 *hpet_boot_cfg;
>> +static u64 __initdata hpet_rate;
> Use uint64_t as you move this here?

Ok.

>
>> +bool __init hpet_enable_legacy_replacement_mode(void)
>> +{
>> +    unsigned int id, cfg, c0_cfg, ticks, count;
>> +
>> +    if ( !hpet_rate ||
> I think you need to also honor opt_hpet here.

Can't (order of patches), and also no need.

When opt_hpet is introduced, hpet_rate can't become nonzero without
opt_hpet being set.

>
>> +         !((id = hpet_read32(HPET_ID)) & HPET_ID_LEGSUP) ||
> I don't think I see a need for the assignment and hence the local
> variable. Dropping it would make the code easier to read imo.

Ok.

~Andrew


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] x86/hpet: Factor hpet_enable_legacy_replacement_mode() out of hpet_setup()
  2021-03-26 10:53     ` Andrew Cooper
@ 2021-03-26 10:55       ` Jan Beulich
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2021-03-26 10:55 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Ian Jackson, Marek Marczykowski-Górecki,
	Frédéric Pierret, Xen-devel

On 26.03.2021 11:53, Andrew Cooper wrote:
> On 26/03/2021 09:59, Jan Beulich wrote:
>> On 25.03.2021 17:52, Andrew Cooper wrote:
>>> +bool __init hpet_enable_legacy_replacement_mode(void)
>>> +{
>>> +    unsigned int id, cfg, c0_cfg, ticks, count;
>>> +
>>> +    if ( !hpet_rate ||
>> I think you need to also honor opt_hpet here.
> 
> Can't (order of patches), and also no need.
> 
> When opt_hpet is introduced, hpet_rate can't become nonzero without
> opt_hpet being set.

Oh, right, sorry: I did mix up hpet_address and hpet_rate.

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-25 17:21   ` [PATCH v1.1 " Andrew Cooper
  2021-03-26  9:51     ` Jan Beulich
@ 2021-03-26 12:03     ` Ian Jackson
  2021-03-26 12:29       ` Ian Jackson
  2021-03-26 13:50     ` Frédéric Pierret
  2 siblings, 1 reply; 27+ messages in thread
From: Ian Jackson @ 2021-03-26 12:03 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Xen-devel, Jan Beulich, Roger Pau Monné,
	Wei Liu, Marek Marczykowski-Górecki,
	Frédéric Pierret

Andrew Cooper writes ("[PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally"):
> From: Jan Beulich <jbeulich@suse.com>
> 
> Commit e1de4c196a2e ("x86/timer: Fix boot on Intel systems using ITSSPRC
> static PIT clock gating") was reported to cause boot failures on certain
> AMD Ryzen systems.
> 
> Refine the fix to do nothing in the default case, and only attempt to
> configure legacy replacement mode if IRQ0 is found to not be working.
> 
> In addition, introduce a "hpet" command line option so this heuristic
> can be overridden.  Since it makes little sense to introduce just
> "hpet=legacy-replacement", also allow for a boolean argument as well as
> "broadcast" to replace the separate "hpetbroadcast" option.

I'm sorry, but I think it is too late for 4.15 to do this.  I prefer
Jan's patch which I have alread release-acked.

Can someone qualified please provide a maintainer review for this,
ideally today ?

Ian.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-26 12:03     ` Ian Jackson
@ 2021-03-26 12:29       ` Ian Jackson
  2021-03-26 13:03         ` Tamas K Lengyel
  2021-03-26 13:30         ` Jan Beulich
  0 siblings, 2 replies; 27+ messages in thread
From: Ian Jackson @ 2021-03-26 12:29 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Andrew Cooper, Xen-devel, Jan Beulich, Wei Liu,
	Marek Marczykowski-Górecki, Frédéric Pierret

I wrote:
> I'm sorry, but I think it is too late for 4.15 to do this.  I prefer
> Jan's patch which I have alread release-acked.
> 
> Can someone qualified please provide a maintainer review for this,
> ideally today ?

I asked Andrew on IRC:

12:08 <Diziet> andyhhp__: Are you prepared to maintainer-ack Jan's
               more-minimal hpet workaround approach ?
12:16 <andyhhp__> Diziet: honestly, no.  I don't consider that
                  acceptable behaviour, and it is a fairly big "f you"
                  (this was literally feedback I got in private) to
                  the downstreams who've spent years trying to get us
                  to fix this bug, and have now backported the first
                  version.
12:16 <andyhhp__> I'm looking into the feedback on my series
12:17 <andyhhp__> one way or another, the moment we enter the fallback
                  path for interrupt routing, something is very broken
                  on the system
12:19 <andyhhp__> so the tradeoff is an unspecified bug on one ancient
                  laptop which can't be tested now, vs 5 years of Atom
                  CPUs, 2 years of latop CPUs, and the forthcoming
                  Server line of Intel CPUs
12:19 <andyhhp__> or whatever other compromise we can work on

I'm sorry that this bug is going to continue to be not properly fixed.
As I understand it the practical impact is that users of those
affected systems (the newer ones you mention) will have to add a
command-line option.  That is, unfortunately, the downside of
time-based releases.  If we had been having this conversation two
weeks ago I would have very likely had a different answer.

I consider the current situation in xen.git#staging-4.15 a blocker for
the release and I want to get the code finalised.  I hope that
Monday's RC will be the last RC and that after that there will be only
docs changes.  That would mean committing a workaround today.

Roger, would you be able to give me a maintainer review of Jan's

 [PATCH][4.15] x86/HPET: don't enable legacy replacement mode unconditionally

?

Andrew, I don't think you have, so far, Nak'd Jan's patch.  If you
feel it warrants your Nak please provide it ASAP.

Thanks,
Ian.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-26 12:29       ` Ian Jackson
@ 2021-03-26 13:03         ` Tamas K Lengyel
  2021-03-26 13:15           ` Ian Jackson
  2021-03-26 13:31           ` Jan Beulich
  2021-03-26 13:30         ` Jan Beulich
  1 sibling, 2 replies; 27+ messages in thread
From: Tamas K Lengyel @ 2021-03-26 13:03 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Roger Pau Monné,
	Andrew Cooper, Xen-devel, Jan Beulich, Wei Liu,
	Marek Marczykowski-Górecki, Frédéric Pierret

On Fri, Mar 26, 2021 at 8:30 AM Ian Jackson <iwj@xenproject.org> wrote:
>
> I wrote:
> > I'm sorry, but I think it is too late for 4.15 to do this.  I prefer
> > Jan's patch which I have alread release-acked.
> >
> > Can someone qualified please provide a maintainer review for this,
> > ideally today ?
>
> I asked Andrew on IRC:
>
> 12:08 <Diziet> andyhhp__: Are you prepared to maintainer-ack Jan's
>                more-minimal hpet workaround approach ?
> 12:16 <andyhhp__> Diziet: honestly, no.  I don't consider that
>                   acceptable behaviour, and it is a fairly big "f you"
>                   (this was literally feedback I got in private) to
>                   the downstreams who've spent years trying to get us
>                   to fix this bug, and have now backported the first
>                   version.
> 12:16 <andyhhp__> I'm looking into the feedback on my series
> 12:17 <andyhhp__> one way or another, the moment we enter the fallback
>                   path for interrupt routing, something is very broken
>                   on the system
> 12:19 <andyhhp__> so the tradeoff is an unspecified bug on one ancient
>                   laptop which can't be tested now, vs 5 years of Atom
>                   CPUs, 2 years of latop CPUs, and the forthcoming
>                   Server line of Intel CPUs
> 12:19 <andyhhp__> or whatever other compromise we can work on
>
> I'm sorry that this bug is going to continue to be not properly fixed.
> As I understand it the practical impact is that users of those
> affected systems (the newer ones you mention) will have to add a
> command-line option.  That is, unfortunately, the downside of
> time-based releases.  If we had been having this conversation two
> weeks ago I would have very likely had a different answer.

The problem from my perspective is that the end-users have no way to
determine when that boot option is needing to be set. Having an
installation step of "check if things explode when you reboot" is just
plain bad. Many times you don't even have access to a remote serial
console to check why Xen didn't boot. So that's not a realistic route
that can be taken. If Jan's patch is applied then the only thing I'll
be able to do is make all installations always-enable this option even
on systems that would have booted fine otherwise without it. It is
unclear if that has any downsides of its own and could very well just
kick the can down the road and lead to other issues.

Tamas


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-26 13:03         ` Tamas K Lengyel
@ 2021-03-26 13:15           ` Ian Jackson
  2021-03-26 13:37             ` Jan Beulich
                               ` (2 more replies)
  2021-03-26 13:31           ` Jan Beulich
  1 sibling, 3 replies; 27+ messages in thread
From: Ian Jackson @ 2021-03-26 13:15 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Roger Pau Monné,
	Andrew Cooper, Xen-devel, Jan Beulich, Wei Liu,
	Marek Marczykowski-Górecki, Frédéric Pierret

Tamas K Lengyel writes ("Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally"):
> The problem from my perspective is that the end-users have no way to
> determine when that boot option is needing to be set. Having an
> installation step of "check if things explode when you reboot" is just
> plain bad. Many times you don't even have access to a remote serial
> console to check why Xen didn't boot. So that's not a realistic route
> that can be taken. If Jan's patch is applied then the only thing I'll
> be able to do is make all installations always-enable this option even
> on systems that would have booted fine otherwise without it. It is
> unclear if that has any downsides of its own and could very well just
> kick the can down the road and lead to other issues.

Thanks for the clear explanation.

I think our options are:

 1. What is currently in xen.git#staging-4.15: some different set of
    machines do not work (reliably? at all?), constituting a
    regression on older hardware.

 2. Jan's patch, with the consequences you describe.  Constituing a
    continued failure to properly support the newer hardware.

 3. Andy's patches which are not finished yet and are therefore high
    risk.  Ie, delay the release.

Please let me know if you think this characterisation of the situation
is inaccurate or misleading.

This is not a good set of options.

Of them, I still think I would choose (2).  But I would love it if
someone were to come up with a better suggestion (perhaps a variant on
one of the above).

Ian.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-26 12:29       ` Ian Jackson
  2021-03-26 13:03         ` Tamas K Lengyel
@ 2021-03-26 13:30         ` Jan Beulich
  2021-03-26 13:43           ` Marek Marczykowski-Górecki
  1 sibling, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2021-03-26 13:30 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Andrew Cooper, Xen-devel, Wei Liu,
	Marek Marczykowski-Górecki, Frédéric Pierret,
	Roger Pau Monné

On 26.03.2021 13:29, Ian Jackson wrote:
> I wrote:
>> I'm sorry, but I think it is too late for 4.15 to do this.  I prefer
>> Jan's patch which I have alread release-acked.
>>
>> Can someone qualified please provide a maintainer review for this,
>> ideally today ?
> 
> I asked Andrew on IRC:
> 
> 12:08 <Diziet> andyhhp__: Are you prepared to maintainer-ack Jan's
>                more-minimal hpet workaround approach ?
> 12:16 <andyhhp__> Diziet: honestly, no.  I don't consider that
>                   acceptable behaviour, and it is a fairly big "f you"
>                   (this was literally feedback I got in private) to
>                   the downstreams who've spent years trying to get us
>                   to fix this bug, and have now backported the first
>                   version.
> 12:16 <andyhhp__> I'm looking into the feedback on my series
> 12:17 <andyhhp__> one way or another, the moment we enter the fallback
>                   path for interrupt routing, something is very broken
>                   on the system
> 12:19 <andyhhp__> so the tradeoff is an unspecified bug on one ancient
>                   laptop which can't be tested now, vs 5 years of Atom
>                   CPUs, 2 years of latop CPUs, and the forthcoming
>                   Server line of Intel CPUs
> 12:19 <andyhhp__> or whatever other compromise we can work on
> 
> I'm sorry that this bug is going to continue to be not properly fixed.

Actually I had another thought here in the morning, but then didn't
write it down: While Andrew's approach indeed would (hopefully)
improve user experience, it'll reduce the incentive of actually
fixing the issue. Normally I might not be that concerned, but seeing
how long it took to even arrive at a workaround, I'm afraid now I am
concerned.

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-26 13:03         ` Tamas K Lengyel
  2021-03-26 13:15           ` Ian Jackson
@ 2021-03-26 13:31           ` Jan Beulich
  2021-03-26 13:39             ` Ian Jackson
  1 sibling, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2021-03-26 13:31 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: Roger Pau Monné,
	Andrew Cooper, Xen-devel, Wei Liu,
	Marek Marczykowski-Górecki, Frédéric Pierret,
	Ian Jackson

On 26.03.2021 14:03, Tamas K Lengyel wrote:
> On Fri, Mar 26, 2021 at 8:30 AM Ian Jackson <iwj@xenproject.org> wrote:
>>
>> I wrote:
>>> I'm sorry, but I think it is too late for 4.15 to do this.  I prefer
>>> Jan's patch which I have alread release-acked.
>>>
>>> Can someone qualified please provide a maintainer review for this,
>>> ideally today ?
>>
>> I asked Andrew on IRC:
>>
>> 12:08 <Diziet> andyhhp__: Are you prepared to maintainer-ack Jan's
>>                more-minimal hpet workaround approach ?
>> 12:16 <andyhhp__> Diziet: honestly, no.  I don't consider that
>>                   acceptable behaviour, and it is a fairly big "f you"
>>                   (this was literally feedback I got in private) to
>>                   the downstreams who've spent years trying to get us
>>                   to fix this bug, and have now backported the first
>>                   version.
>> 12:16 <andyhhp__> I'm looking into the feedback on my series
>> 12:17 <andyhhp__> one way or another, the moment we enter the fallback
>>                   path for interrupt routing, something is very broken
>>                   on the system
>> 12:19 <andyhhp__> so the tradeoff is an unspecified bug on one ancient
>>                   laptop which can't be tested now, vs 5 years of Atom
>>                   CPUs, 2 years of latop CPUs, and the forthcoming
>>                   Server line of Intel CPUs
>> 12:19 <andyhhp__> or whatever other compromise we can work on
>>
>> I'm sorry that this bug is going to continue to be not properly fixed.
>> As I understand it the practical impact is that users of those
>> affected systems (the newer ones you mention) will have to add a
>> command-line option.  That is, unfortunately, the downside of
>> time-based releases.  If we had been having this conversation two
>> weeks ago I would have very likely had a different answer.
> 
> The problem from my perspective is that the end-users have no way to
> determine when that boot option is needing to be set. Having an
> installation step of "check if things explode when you reboot" is just
> plain bad. Many times you don't even have access to a remote serial
> console to check why Xen didn't boot.

I guess I don't see the serial console aspect here: This sort of
boot issue can be seen on the normal screen as well. It occurs
neither too early nor too late to be visible. We could amend the
output by a hint towards this option.

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-26 13:15           ` Ian Jackson
@ 2021-03-26 13:37             ` Jan Beulich
  2021-03-26 14:07               ` Ian Jackson
  2021-03-26 14:23             ` Marek Marczykowski-Górecki
  2021-03-29 13:30             ` Roger Pau Monné
  2 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2021-03-26 13:37 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Roger Pau Monné,
	Andrew Cooper, Xen-devel, Wei Liu,
	Marek Marczykowski-Górecki, Frédéric Pierret,
	Tamas K Lengyel

On 26.03.2021 14:15, Ian Jackson wrote:
> Tamas K Lengyel writes ("Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally"):
>> The problem from my perspective is that the end-users have no way to
>> determine when that boot option is needing to be set. Having an
>> installation step of "check if things explode when you reboot" is just
>> plain bad. Many times you don't even have access to a remote serial
>> console to check why Xen didn't boot. So that's not a realistic route
>> that can be taken. If Jan's patch is applied then the only thing I'll
>> be able to do is make all installations always-enable this option even
>> on systems that would have booted fine otherwise without it. It is
>> unclear if that has any downsides of its own and could very well just
>> kick the can down the road and lead to other issues.
> 
> Thanks for the clear explanation.
> 
> I think our options are:
> 
>  1. What is currently in xen.git#staging-4.15: some different set of
>     machines do not work (reliably? at all?), constituting a
>     regression on older hardware.
> 
>  2. Jan's patch, with the consequences you describe.  Constituing a
>     continued failure to properly support the newer hardware.
> 
>  3. Andy's patches which are not finished yet and are therefore high
>     risk.  Ie, delay the release.
> 
> Please let me know if you think this characterisation of the situation
> is inaccurate or misleading.
> 
> This is not a good set of options.
> 
> Of them, I still think I would choose (2).  But I would love it if
> someone were to come up with a better suggestion (perhaps a variant on
> one of the above).

TBH delaying the release for this specific problem should be seriously
considered imo. In principle I'm in favor of (3) of the above, if there
weren't the downsides I did mention in prior mails.

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-26 13:31           ` Jan Beulich
@ 2021-03-26 13:39             ` Ian Jackson
  0 siblings, 0 replies; 27+ messages in thread
From: Ian Jackson @ 2021-03-26 13:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tamas K Lengyel, Roger Pau Monné,
	Andrew Cooper, Xen-devel, Wei Liu,
	Marek Marczykowski-Górecki, Frédéric Pierret

Jan Beulich writes ("Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally"):
> I guess I don't see the serial console aspect here: This sort of
> boot issue can be seen on the normal screen as well. It occurs
> neither too early nor too late to be visible. We could amend the
> output by a hint towards this option.

Changes to message strings would be fine even if done next week.

It looks like I am going to have to do the code review of this change
myself, if I want it today ?  This is far from ideal as I am no expert
in this area.

Ian.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-26 13:30         ` Jan Beulich
@ 2021-03-26 13:43           ` Marek Marczykowski-Górecki
  2021-03-26 14:02             ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-03-26 13:43 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Ian Jackson, Andrew Cooper, Xen-devel, Wei Liu,
	Frédéric Pierret, Roger Pau Monné

[-- Attachment #1: Type: text/plain, Size: 2540 bytes --]

On Fri, Mar 26, 2021 at 02:30:09PM +0100, Jan Beulich wrote:
> On 26.03.2021 13:29, Ian Jackson wrote:
> > I wrote:
> >> I'm sorry, but I think it is too late for 4.15 to do this.  I prefer
> >> Jan's patch which I have alread release-acked.
> >>
> >> Can someone qualified please provide a maintainer review for this,
> >> ideally today ?
> > 
> > I asked Andrew on IRC:
> > 
> > 12:08 <Diziet> andyhhp__: Are you prepared to maintainer-ack Jan's
> >                more-minimal hpet workaround approach ?
> > 12:16 <andyhhp__> Diziet: honestly, no.  I don't consider that
> >                   acceptable behaviour, and it is a fairly big "f you"
> >                   (this was literally feedback I got in private) to
> >                   the downstreams who've spent years trying to get us
> >                   to fix this bug, and have now backported the first
> >                   version.
> > 12:16 <andyhhp__> I'm looking into the feedback on my series
> > 12:17 <andyhhp__> one way or another, the moment we enter the fallback
> >                   path for interrupt routing, something is very broken
> >                   on the system
> > 12:19 <andyhhp__> so the tradeoff is an unspecified bug on one ancient
> >                   laptop which can't be tested now, vs 5 years of Atom
> >                   CPUs, 2 years of latop CPUs, and the forthcoming
> >                   Server line of Intel CPUs
> > 12:19 <andyhhp__> or whatever other compromise we can work on
> > 
> > I'm sorry that this bug is going to continue to be not properly fixed.
> 
> Actually I had another thought here in the morning, but then didn't
> write it down: While Andrew's approach indeed would (hopefully)
> improve user experience, it'll reduce the incentive of actually
> fixing the issue. Normally I might not be that concerned, but seeing
> how long it took to even arrive at a workaround, I'm afraid now I am
> concerned.

I assume "the issue" you meant "Xen using legacy stuff that stops being
supported by the hardware", right? Yes it is an issue. But for most
users of Xen, having it broken more likely will results in "lets switch
to something that works" (perhaps not after the first such case, but
this is definitely not the first one) instead of "lets spend some hours
on debugging this very low level issue".
And to be honest, this is more and more appealing option, even though
all the deficiencies of KVM...

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-25 17:21   ` [PATCH v1.1 " Andrew Cooper
  2021-03-26  9:51     ` Jan Beulich
  2021-03-26 12:03     ` Ian Jackson
@ 2021-03-26 13:50     ` Frédéric Pierret
  2 siblings, 0 replies; 27+ messages in thread
From: Frédéric Pierret @ 2021-03-26 13:50 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Jan Beulich, Roger Pau Monné,
	Wei Liu, Ian Jackson, Marek Marczykowski-Górecki


[-- Attachment #1.1: Type: text/plain, Size: 9238 bytes --]


Hi,
I confirm your patch makes my Ryzen 7 1800X platform working again! To double check that I've not messed up with xen.gz on my /boot, adding hpet=legacy-replacement makes my computer reboot as the original issue.

I hope this will hit stable release! Thank you for that!

Best,
Frédéric

Le 3/25/21 à 6:21 PM, Andrew Cooper a écrit :
> From: Jan Beulich <jbeulich@suse.com>
> 
> Commit e1de4c196a2e ("x86/timer: Fix boot on Intel systems using ITSSPRC
> static PIT clock gating") was reported to cause boot failures on certain
> AMD Ryzen systems.
> 
> Refine the fix to do nothing in the default case, and only attempt to
> configure legacy replacement mode if IRQ0 is found to not be working.
> 
> In addition, introduce a "hpet" command line option so this heuristic
> can be overridden.  Since it makes little sense to introduce just
> "hpet=legacy-replacement", also allow for a boolean argument as well as
> "broadcast" to replace the separate "hpetbroadcast" option.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
> CC: Ian Jackson <iwj@xenproject.org>
> CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> CC: Frédéric Pierret <frederic.pierret@qubes-os.org>
> 
> v2:
>   * Drop missing hunk from Jan's original patch.
> 
> For 4.15: Attempt to unbreak AMD Ryzen 1800X systems.
> ---
>   docs/misc/xen-command-line.pandoc | 33 +++++++++++++++++++++++++++
>   xen/arch/x86/hpet.c               | 48 +++++++++++++++++++++++++--------------
>   xen/arch/x86/io_apic.c            | 26 +++++++++++++++++++++
>   xen/include/asm-x86/hpet.h        |  1 +
>   4 files changed, 91 insertions(+), 17 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index a0601ff838..4d020d4ad7 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1274,9 +1274,42 @@ supported. See docs/misc/arm/big.LITTLE.txt for more information.
>   When the hmp-unsafe option is disabled (default), CPUs that are not
>   identical to the boot CPU will be parked and not used by Xen.
>   
> +### hpet (x86)
> +    = List of [ <bool> | broadcast=<bool> | legacy-replacement=<bool> ]
> +
> +    Applicability: x86
> +
> +Controls Xen's use of the system's High Precision Event Timer.  By default,
> +Xen will use an HPET when available and not subject to errata.  Use of the
> +HPET can be disabled by specifying `hpet=0`.
> +
> + * The `broadcast` boolean is disabled by default, but forces Xen to keep
> +   using the broadcast for CPUs in deep C-states even when an RTC interrupt is
> +   enabled.  This then also affects raising of the RTC interrupt.
> +
> + * The `legacy-replacement` boolean allows for control over whether Legacy
> +   Replacement mode is enabled.
> +
> +   Legacy Replacement mode is intended for hardware which does not have an
> +   8025 PIT, and allows the HPET to be configured into a compatible mode.
> +   Intel chipsets from Skylake/ApolloLake onwards can turn the PIT off for
> +   power saving reasons, and there is no platform-agnostic mechanism for
> +   discovering this.
> +
> +   By default, Xen will not change hardware configuration, unless the PIT
> +   appears to be absent, at which point Xen will try to enable Legacy
> +   Replacement mode before falling back to pre-IO-APIC interrupt routing
> +   options.
> +
> +   This behaviour can be inhibited by specifying `legacy-replacement=0`.
> +   Alternatively, this mode can be enabled unconditionally (if available) by
> +   specifying `legacy-replacement=1`.
> +
>   ### hpetbroadcast (x86)
>   > `= <boolean>`
>   
> +Deprecated alternative of `hpet=broadcast`.
> +
>   ### hvm_debug (x86)
>   > `= <integer>`
>   
> diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
> index c73135bb15..957e053a47 100644
> --- a/xen/arch/x86/hpet.c
> +++ b/xen/arch/x86/hpet.c
> @@ -52,6 +52,8 @@ static unsigned int __read_mostly num_hpets_used;
>   DEFINE_PER_CPU(struct hpet_event_channel *, cpu_bc_channel);
>   
>   unsigned long __initdata hpet_address;
> +int8_t __initdata opt_hpet_legacy_replacement = -1;
> +static bool __initdata opt_hpet = true;
>   u8 __initdata hpet_blockid;
>   u8 __initdata hpet_flags;
>   
> @@ -63,6 +65,32 @@ u8 __initdata hpet_flags;
>   static bool __initdata force_hpet_broadcast;
>   boolean_param("hpetbroadcast", force_hpet_broadcast);
>   
> +static int __init parse_hpet_param(const char *s)
> +{
> +    const char *ss;
> +    int val, rc = 0;
> +
> +    do {
> +        ss = strchr(s, ',');
> +        if ( !ss )
> +            ss = strchr(s, '\0');
> +
> +        if ( (val = parse_bool(s, ss)) >= 0 )
> +            opt_hpet = val;
> +        else if ( (val = parse_boolean("broadcast", s, ss)) >= 0 )
> +            force_hpet_broadcast = val;
> +        else if ( (val = parse_boolean("legacy-replacement", s, ss)) >= 0 )
> +            opt_hpet_legacy_replacement = val;
> +        else
> +            rc = -EINVAL;
> +
> +        s = ss + 1;
> +    } while ( *ss );
> +
> +    return rc;
> +}
> +custom_param("hpet", parse_hpet_param);
> +
>   /*
>    * Calculate a multiplication factor for scaled math, which is used to convert
>    * nanoseconds based values to clock ticks:
> @@ -820,12 +848,9 @@ u64 __init hpet_setup(void)
>       unsigned int hpet_id, hpet_period;
>       unsigned int last, rem;
>   
> -    if ( hpet_rate )
> +    if ( hpet_rate || !hpet_address || !opt_hpet )
>           return hpet_rate;
>   
> -    if ( hpet_address == 0 )
> -        return 0;
> -
>       set_fixmap_nocache(FIX_HPET_BASE, hpet_address);
>   
>       hpet_id = hpet_read32(HPET_ID);
> @@ -852,19 +877,8 @@ u64 __init hpet_setup(void)
>       if ( (rem * 2) > hpet_period )
>           hpet_rate++;
>   
> -    /*
> -     * Intel chipsets from Skylake/ApolloLake onwards can statically clock
> -     * gate the 8259 PIT.  This option is enabled by default in slightly later
> -     * systems, as turning the PIT off is a prerequisite to entering the C11
> -     * power saving state.
> -     *
> -     * Xen currently depends on the legacy timer interrupt being active while
> -     * IRQ routing is configured.
> -     *
> -     * Reconfigure the HPET into legacy mode to re-establish the timer
> -     * interrupt.
> -     */
> -    hpet_enable_legacy_replacement_mode();
> +    if ( opt_hpet_legacy_replacement > 0 )
> +        hpet_enable_legacy_replacement_mode();
>   
>       return hpet_rate;
>   }
> diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
> index e93265f379..f08c60d71f 100644
> --- a/xen/arch/x86/io_apic.c
> +++ b/xen/arch/x86/io_apic.c
> @@ -29,6 +29,8 @@
>   #include <xen/acpi.h>
>   #include <xen/keyhandler.h>
>   #include <xen/softirq.h>
> +
> +#include <asm/hpet.h>
>   #include <asm/mc146818rtc.h>
>   #include <asm/smp.h>
>   #include <asm/desc.h>
> @@ -1922,14 +1924,38 @@ static void __init check_timer(void)
>              vector, apic1, pin1, apic2, pin2);
>   
>       if (pin1 != -1) {
> +        bool hpet_changed = false;
> +
>           /*
>            * Ok, does IRQ0 through the IOAPIC work?
>            */
>           unmask_IO_APIC_irq(irq_to_desc(0));
> +    retry_ioapic_pin:
>           if (timer_irq_works()) {
>               local_irq_restore(flags);
>               return;
>           }
> +
> +        /*
> +         * Intel chipsets from Skylake/ApolloLake onwards can statically clock
> +         * gate the 8259 PIT.  This option is enabled by default in slightly
> +         * later systems, as turning the PIT off is a prerequisite to entering
> +         * the C11 power saving state.
> +         *
> +         * Xen currently depends on the legacy timer interrupt being active
> +         * while IRQ routing is configured.
> +         *
> +         * If the user hasn't made an explicit option, attempt to reconfigure
> +         * the HPET into legacy mode to re-establish the timer interrupt.
> +         */
> +        if ( opt_hpet_legacy_replacement < 0 &&
> +             !hpet_changed && hpet_enable_legacy_replacement_mode() )
> +        {
> +            printk(XENLOG_ERR "..no 8254 timer found - trying HPET Legacy Replacement Mode\n");
> +            hpet_changed = true;
> +            goto retry_ioapic_pin;
> +        }
> +
>           clear_IO_APIC_pin(apic1, pin1);
>           printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to "
>                  "IO-APIC\n");
> diff --git a/xen/include/asm-x86/hpet.h b/xen/include/asm-x86/hpet.h
> index 50176de3d2..07bc8d6079 100644
> --- a/xen/include/asm-x86/hpet.h
> +++ b/xen/include/asm-x86/hpet.h
> @@ -53,6 +53,7 @@
>   extern unsigned long hpet_address;
>   extern u8 hpet_blockid;
>   extern u8 hpet_flags;
> +extern int8_t opt_hpet_legacy_replacement;
>   
>   /*
>    * Detect and initialise HPET hardware: return counter update frequency.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-26 13:43           ` Marek Marczykowski-Górecki
@ 2021-03-26 14:02             ` Jan Beulich
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2021-03-26 14:02 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Ian Jackson, Andrew Cooper, Xen-devel, Wei Liu,
	Frédéric Pierret, Roger Pau Monné

On 26.03.2021 14:43, Marek Marczykowski-Górecki wrote:
> On Fri, Mar 26, 2021 at 02:30:09PM +0100, Jan Beulich wrote:
>> On 26.03.2021 13:29, Ian Jackson wrote:
>>> I wrote:
>>>> I'm sorry, but I think it is too late for 4.15 to do this.  I prefer
>>>> Jan's patch which I have alread release-acked.
>>>>
>>>> Can someone qualified please provide a maintainer review for this,
>>>> ideally today ?
>>>
>>> I asked Andrew on IRC:
>>>
>>> 12:08 <Diziet> andyhhp__: Are you prepared to maintainer-ack Jan's
>>>                more-minimal hpet workaround approach ?
>>> 12:16 <andyhhp__> Diziet: honestly, no.  I don't consider that
>>>                   acceptable behaviour, and it is a fairly big "f you"
>>>                   (this was literally feedback I got in private) to
>>>                   the downstreams who've spent years trying to get us
>>>                   to fix this bug, and have now backported the first
>>>                   version.
>>> 12:16 <andyhhp__> I'm looking into the feedback on my series
>>> 12:17 <andyhhp__> one way or another, the moment we enter the fallback
>>>                   path for interrupt routing, something is very broken
>>>                   on the system
>>> 12:19 <andyhhp__> so the tradeoff is an unspecified bug on one ancient
>>>                   laptop which can't be tested now, vs 5 years of Atom
>>>                   CPUs, 2 years of latop CPUs, and the forthcoming
>>>                   Server line of Intel CPUs
>>> 12:19 <andyhhp__> or whatever other compromise we can work on
>>>
>>> I'm sorry that this bug is going to continue to be not properly fixed.
>>
>> Actually I had another thought here in the morning, but then didn't
>> write it down: While Andrew's approach indeed would (hopefully)
>> improve user experience, it'll reduce the incentive of actually
>> fixing the issue. Normally I might not be that concerned, but seeing
>> how long it took to even arrive at a workaround, I'm afraid now I am
>> concerned.
> 
> I assume "the issue" you meant "Xen using legacy stuff that stops being
> supported by the hardware", right? Yes it is an issue. But for most
> users of Xen, having it broken more likely will results in "lets switch
> to something that works" (perhaps not after the first such case, but
> this is definitely not the first one) instead of "lets spend some hours
> on debugging this very low level issue".

Like sadly is the case in so many areas nowadays, this to me suggests
that you value short term benefits over things working correctly long
term. Yes, it is important to be attractive to users. But this would
better not be at the price of keeping in place workarounds for overly
long periods of time, possible even forever. Such is likely to bite
us (perhaps by way of biting some of our users) down the road.

To be honest, I find it very strange that the original report over a
month ago was never followed up by the necessary technical detail.
Andrew did tell me that outside of the report on the mailing list he
did explicitly ask for such. (I can't rule out that meanwhile he was
given the info, but really all of this would better be on xen-devel.)

> And to be honest, this is more and more appealing option, even though
> all the deficiencies of KVM...

Well, feel free to throw more engineering resources into Xen's
(upstream) maintenance. There being a much larger community of
engineers around KVM is perhaps the main reason here.

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-26 13:37             ` Jan Beulich
@ 2021-03-26 14:07               ` Ian Jackson
  0 siblings, 0 replies; 27+ messages in thread
From: Ian Jackson @ 2021-03-26 14:07 UTC (permalink / raw)
  To: committers
  Cc: Jan Beulich, Roger Pau Monné,
	Andrew Cooper, Xen-devel, Wei Liu,
	Marek Marczykowski-Górecki, Frédéric Pierret,
	Tamas K Lengyel

Jan Beulich writes ("Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally"):
> TBH delaying the release for this specific problem should be seriously
> considered imo. In principle I'm in favor of (3) of the above, if there
> weren't the downsides I did mention in prior mails.

Thanks for putting forward your opinion.  I am always happy to hear
what people say and this input is very valuable.  However:

I am not inclined to delay the release over this.  Delaying the
release might be appropriate if this problem was unforeseen and
recently discovered, late in the freeze.  But it was not.

That there was a significant regression caused by e1de4c196a2e
  x86/timer: Fix boot on Intel systems using ITSSPRC static PIT clock gating
was already known at least by the 24th of February[1].

Since then, that change has been at risk of being reverted if it went
unfixed.  Unfortunately the the first cut of patches to try fix this
something like properly were only posted yesterday.

It is up to everyone who wants something to make it into the release,
to make sure that the code is ready in time.  That includes sorting
out any regressions it introduces.  In the case of e1de4c196a2e that
has not occurred.

It doesn't seem to me that we will have sufficient confidence in the
more comphrenesive fix, for it to go into staging-4.15 today.

I think the appropriate course, therefore, is the conditional (based
on commaned line) revert proposed by Jan.

Sorry,
Ian.

[1] https://lists.xenproject.org/archives/html/xen-devel/2021-02/msg01533.html


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-26 13:15           ` Ian Jackson
  2021-03-26 13:37             ` Jan Beulich
@ 2021-03-26 14:23             ` Marek Marczykowski-Górecki
  2021-03-26 15:02               ` Tamas K Lengyel
  2021-03-29 13:30             ` Roger Pau Monné
  2 siblings, 1 reply; 27+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-03-26 14:23 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Tamas K Lengyel, Roger Pau Monné,
	Andrew Cooper, Xen-devel, Jan Beulich, Wei Liu,
	Frédéric Pierret

[-- Attachment #1: Type: text/plain, Size: 2967 bytes --]

On Fri, Mar 26, 2021 at 01:15:42PM +0000, Ian Jackson wrote:
> Tamas K Lengyel writes ("Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally"):
> > The problem from my perspective is that the end-users have no way to
> > determine when that boot option is needing to be set. Having an
> > installation step of "check if things explode when you reboot" is just
> > plain bad. Many times you don't even have access to a remote serial
> > console to check why Xen didn't boot. So that's not a realistic route
> > that can be taken. If Jan's patch is applied then the only thing I'll
> > be able to do is make all installations always-enable this option even
> > on systems that would have booted fine otherwise without it. It is
> > unclear if that has any downsides of its own and could very well just
> > kick the can down the road and lead to other issues.
> 
> Thanks for the clear explanation.
> 
> I think our options are:
> 
>  1. What is currently in xen.git#staging-4.15: some different set of
>     machines do not work (reliably? at all?), constituting a
>     regression on older hardware.
> 
>  2. Jan's patch, with the consequences you describe.  Constituing a
>     continued failure to properly support the newer hardware.
> 
>  3. Andy's patches which are not finished yet and are therefore high
>     risk.  Ie, delay the release.

I do have several confirmations that the "x86/timer: Fix boot on Intel
systems using ITSSPRC static PIT clock gating" patch indeed unbreaks
several Intel systems. And only one report about it causing a regression
on some AMD (although I may miss some others on the list).
Reverting to the previous default behavior I would also call a
regression.

I have tested Andy's patches on several machines and I can confirm they
fixed the issue - both keep the original issue fixed and fixes the
regression.
I see also Frédéric (who originally reported the regression) also
confirms it fixes it for him.

> Please let me know if you think this characterisation of the situation
> is inaccurate or misleading.

Both versions (with "x86/timer: Fix boot on Intel systems using ITSSPRC
static PIT clock gating" and without it) got significant testing and
results are as you summarize - each of those variants alone is broken on
some subset of hardware. What Andrew's patches do is to combine both
versions into one, to choose the right behavior depending on the
hardware. Specifically, apply the workaround in place of direct panic.
This isn't some completely new behavior. I think it is reasonably safe
to have it included in the release, even at such late time.

> This is not a good set of options.
> 
> Of them, I still think I would choose (2).  But I would love it if
> someone were to come up with a better suggestion (perhaps a variant on
> one of the above).


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 484 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-26 14:23             ` Marek Marczykowski-Górecki
@ 2021-03-26 15:02               ` Tamas K Lengyel
  0 siblings, 0 replies; 27+ messages in thread
From: Tamas K Lengyel @ 2021-03-26 15:02 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Ian Jackson, Roger Pau Monné,
	Andrew Cooper, Xen-devel, Jan Beulich, Wei Liu,
	Frédéric Pierret

On Fri, Mar 26, 2021 at 10:23 AM Marek Marczykowski-Górecki
<marmarek@invisiblethingslab.com> wrote:
>
> On Fri, Mar 26, 2021 at 01:15:42PM +0000, Ian Jackson wrote:
> > Tamas K Lengyel writes ("Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally"):
> > > The problem from my perspective is that the end-users have no way to
> > > determine when that boot option is needing to be set. Having an
> > > installation step of "check if things explode when you reboot" is just
> > > plain bad. Many times you don't even have access to a remote serial
> > > console to check why Xen didn't boot. So that's not a realistic route
> > > that can be taken. If Jan's patch is applied then the only thing I'll
> > > be able to do is make all installations always-enable this option even
> > > on systems that would have booted fine otherwise without it. It is
> > > unclear if that has any downsides of its own and could very well just
> > > kick the can down the road and lead to other issues.
> >
> > Thanks for the clear explanation.
> >
> > I think our options are:
> >
> >  1. What is currently in xen.git#staging-4.15: some different set of
> >     machines do not work (reliably? at all?), constituting a
> >     regression on older hardware.
> >
> >  2. Jan's patch, with the consequences you describe.  Constituing a
> >     continued failure to properly support the newer hardware.
> >
> >  3. Andy's patches which are not finished yet and are therefore high
> >     risk.  Ie, delay the release.
>
> I do have several confirmations that the "x86/timer: Fix boot on Intel
> systems using ITSSPRC static PIT clock gating" patch indeed unbreaks
> several Intel systems. And only one report about it causing a regression
> on some AMD (although I may miss some others on the list).
> Reverting to the previous default behavior I would also call a
> regression.
>
> I have tested Andy's patches on several machines and I can confirm they
> fixed the issue - both keep the original issue fixed and fixes the
> regression.
> I see also Frédéric (who originally reported the regression) also
> confirms it fixes it for him.
>
> > Please let me know if you think this characterisation of the situation
> > is inaccurate or misleading.
>
> Both versions (with "x86/timer: Fix boot on Intel systems using ITSSPRC
> static PIT clock gating" and without it) got significant testing and
> results are as you summarize - each of those variants alone is broken on
> some subset of hardware. What Andrew's patches do is to combine both
> versions into one, to choose the right behavior depending on the
> hardware. Specifically, apply the workaround in place of direct panic.
> This isn't some completely new behavior. I think it is reasonably safe
> to have it included in the release, even at such late time.

My preference would also be going with route 3, if possible in 4.15
from the start. If that can't happen without significant delay to the
release then it should be the first patch to be included for 4.15.1.

Thanks,
Tamas


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-26  9:51     ` Jan Beulich
@ 2021-03-26 16:32       ` Andrew Cooper
  2021-03-26 16:48         ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Cooper @ 2021-03-26 16:32 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	Wei Liu, Ian Jackson, Marek Marczykowski-Górecki,
	Frédéric Pierret, Xen-devel

On 26/03/2021 09:51, Jan Beulich wrote:
> On 25.03.2021 18:21, Andrew Cooper wrote:
>> --- a/docs/misc/xen-command-line.pandoc
>> +++ b/docs/misc/xen-command-line.pandoc
>> @@ -1274,9 +1274,42 @@ supported. See docs/misc/arm/big.LITTLE.txt for more information.
>>  When the hmp-unsafe option is disabled (default), CPUs that are not
>>  identical to the boot CPU will be parked and not used by Xen.
>>  
>> +### hpet (x86)
>> +    = List of [ <bool> | broadcast=<bool> | legacy-replacement=<bool> ]
>> +
>> +    Applicability: x86
> If this is the more modern form to express this information, then the
> (x86) I did put on the sub-title line should imo be dropped.

Oops yes.

>
>> +Controls Xen's use of the system's High Precision Event Timer.  By default,
>> +Xen will use an HPET when available and not subject to errata.  Use of the
>> +HPET can be disabled by specifying `hpet=0`.
>> +
>> + * The `broadcast` boolean is disabled by default, but forces Xen to keep
>> +   using the broadcast for CPUs in deep C-states even when an RTC interrupt is
>> +   enabled.  This then also affects raising of the RTC interrupt.
>> +
>> + * The `legacy-replacement` boolean allows for control over whether Legacy
>> +   Replacement mode is enabled.
>> +
>> +   Legacy Replacement mode is intended for hardware which does not have an
>> +   8025 PIT, and allows the HPET to be configured into a compatible mode.
> 8254 ?

I did spot and fix that...

>
>> @@ -1922,14 +1924,38 @@ static void __init check_timer(void)
>>             vector, apic1, pin1, apic2, pin2);
>>  
>>      if (pin1 != -1) {
>> +        bool hpet_changed = false;
>> +
>>          /*
>>           * Ok, does IRQ0 through the IOAPIC work?
>>           */
>>          unmask_IO_APIC_irq(irq_to_desc(0));
>> +    retry_ioapic_pin:
>>          if (timer_irq_works()) {
>>              local_irq_restore(flags);
>>              return;
>>          }
>> +
>> +        /*
>> +         * Intel chipsets from Skylake/ApolloLake onwards can statically clock
>> +         * gate the 8259 PIT.  This option is enabled by default in slightly
> 8254?

but failed to spot this one.  (It was an error from the original
patch).  Fixed.

>
>> +         * later systems, as turning the PIT off is a prerequisite to entering
>> +         * the C11 power saving state.
>> +         *
>> +         * Xen currently depends on the legacy timer interrupt being active
>> +         * while IRQ routing is configured.
>> +         *
>> +         * If the user hasn't made an explicit option, attempt to reconfigure
> s/option/choice/ or s/made/given/?
>
>> +         * the HPET into legacy mode to re-establish the timer interrupt.
>> +         */
>> +        if ( opt_hpet_legacy_replacement < 0 &&
>> +             !hpet_changed && hpet_enable_legacy_replacement_mode() )
>> +        {
>> +            printk(XENLOG_ERR "..no 8254 timer found - trying HPET Legacy Replacement Mode\n");
>> +            hpet_changed = true;
>> +            goto retry_ioapic_pin;
>> +        }
>> +
>>          clear_IO_APIC_pin(apic1, pin1);
>>          printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to "
>>                 "IO-APIC\n");
> As mentioned on irc already, I'm somewhat concerned by doing this change
> first (and also not undoing it if it didn't work). An AMD Turion based
> laptop I was using many years ago required one of the other fallbacks to
> be engaged, and hence I'd expect certain other (old?) systems to be
> similarly affected. Sadly (for the purposes here) I don't have this
> laptop anymore, so wouldn't be able to verify whether the above actually
> breaks there.

Turion is K8, so very obsolete these days.  If it doesn't have an
IO-APIC, its even less likely to have an HPET.

Even if it does have an HPET, there isn't anything to suggest that
legacy replacement mode is broken.

Would you prefer me to undo the change?  Its not easy - we have the boot
time config stashed, but if it was periodic before, the accumulator is
broken because we can never read that value back out.

> As a minor remark, I find the "goto" based approach not very nice (we've
> been generally saying we consider "goto" okay largely for simplification
> of error handling, to avoid having many [partly] redundant function exit
> paths), but I can see how using for() or while() or do/while() would
> make the code larger and require deeper indentation.

Actually there is rather less to repeat than I was expecting.  I think
it is reasonable to take out the goto.

I don't think we want to multiply this with all fallback modes.  The
issue we're fixing here (new systems don't have a PIT) is orthogonal to
the rest of the fallback chain here which is looking for PIC/APIC problems.

~Andrew



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-26 16:32       ` Andrew Cooper
@ 2021-03-26 16:48         ` Jan Beulich
  2021-03-26 16:53           ` Andrew Cooper
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2021-03-26 16:48 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Ian Jackson, Marek Marczykowski-Górecki,
	Frédéric Pierret, Xen-devel

On 26.03.2021 17:32, Andrew Cooper wrote:
> On 26/03/2021 09:51, Jan Beulich wrote:
>> On 25.03.2021 18:21, Andrew Cooper wrote:
>>> @@ -1922,14 +1924,38 @@ static void __init check_timer(void)
>>>             vector, apic1, pin1, apic2, pin2);
>>>  
>>>      if (pin1 != -1) {
>>> +        bool hpet_changed = false;
>>> +
>>>          /*
>>>           * Ok, does IRQ0 through the IOAPIC work?
>>>           */
>>>          unmask_IO_APIC_irq(irq_to_desc(0));
>>> +    retry_ioapic_pin:
>>>          if (timer_irq_works()) {
>>>              local_irq_restore(flags);
>>>              return;
>>>          }
>>> +
>>> +        /*
>>> +         * Intel chipsets from Skylake/ApolloLake onwards can statically clock
>>> +         * gate the 8259 PIT.  This option is enabled by default in slightly
>>> +         * later systems, as turning the PIT off is a prerequisite to entering
>>> +         * the C11 power saving state.
>>> +         *
>>> +         * Xen currently depends on the legacy timer interrupt being active
>>> +         * while IRQ routing is configured.
>>> +         *
>>> +         * If the user hasn't made an explicit option, attempt to reconfigure
>>> +         * the HPET into legacy mode to re-establish the timer interrupt.
>>> +         */
>>> +        if ( opt_hpet_legacy_replacement < 0 &&
>>> +             !hpet_changed && hpet_enable_legacy_replacement_mode() )
>>> +        {
>>> +            printk(XENLOG_ERR "..no 8254 timer found - trying HPET Legacy Replacement Mode\n");
>>> +            hpet_changed = true;
>>> +            goto retry_ioapic_pin;
>>> +        }
>>> +
>>>          clear_IO_APIC_pin(apic1, pin1);
>>>          printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to "
>>>                 "IO-APIC\n");
>> As mentioned on irc already, I'm somewhat concerned by doing this change
>> first (and also not undoing it if it didn't work). An AMD Turion based
>> laptop I was using many years ago required one of the other fallbacks to
>> be engaged, and hence I'd expect certain other (old?) systems to be
>> similarly affected. Sadly (for the purposes here) I don't have this
>> laptop anymore, so wouldn't be able to verify whether the above actually
>> breaks there.
> 
> Turion is K8, so very obsolete these days.  If it doesn't have an
> IO-APIC, its even less likely to have an HPET.

It did have an IO-APIC, but required one of the virtual-wire modes to
be enabled iirc.

> Even if it does have an HPET, there isn't anything to suggest that
> legacy replacement mode is broken.

With one firmware flaw there is about as much chance for another one
as there is for HPET to be working, I'd say. Iirc (very vaguely) it
did have a HPET, but no ACPI table entry for it, so we wouldn't have
used it.

> Would you prefer me to undo the change?  Its not easy - we have the boot
> time config stashed, but if it was periodic before, the accumulator is
> broken because we can never read that value back out.

I didn't think the accumulator change would matter. I did think though
not having been in legacy replacement mode before might be better to
also not be in after, if its enabling didn't help anyway.

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-26 16:48         ` Jan Beulich
@ 2021-03-26 16:53           ` Andrew Cooper
  0 siblings, 0 replies; 27+ messages in thread
From: Andrew Cooper @ 2021-03-26 16:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	Wei Liu, Ian Jackson, Marek Marczykowski-Górecki,
	Frédéric Pierret, Xen-devel

On 26/03/2021 16:48, Jan Beulich wrote:
> On 26.03.2021 17:32, Andrew Cooper wrote:
>> On 26/03/2021 09:51, Jan Beulich wrote:
>>> On 25.03.2021 18:21, Andrew Cooper wrote:
>>>> @@ -1922,14 +1924,38 @@ static void __init check_timer(void)
>>>>             vector, apic1, pin1, apic2, pin2);
>>>>  
>>>>      if (pin1 != -1) {
>>>> +        bool hpet_changed = false;
>>>> +
>>>>          /*
>>>>           * Ok, does IRQ0 through the IOAPIC work?
>>>>           */
>>>>          unmask_IO_APIC_irq(irq_to_desc(0));
>>>> +    retry_ioapic_pin:
>>>>          if (timer_irq_works()) {
>>>>              local_irq_restore(flags);
>>>>              return;
>>>>          }
>>>> +
>>>> +        /*
>>>> +         * Intel chipsets from Skylake/ApolloLake onwards can statically clock
>>>> +         * gate the 8259 PIT.  This option is enabled by default in slightly
>>>> +         * later systems, as turning the PIT off is a prerequisite to entering
>>>> +         * the C11 power saving state.
>>>> +         *
>>>> +         * Xen currently depends on the legacy timer interrupt being active
>>>> +         * while IRQ routing is configured.
>>>> +         *
>>>> +         * If the user hasn't made an explicit option, attempt to reconfigure
>>>> +         * the HPET into legacy mode to re-establish the timer interrupt.
>>>> +         */
>>>> +        if ( opt_hpet_legacy_replacement < 0 &&
>>>> +             !hpet_changed && hpet_enable_legacy_replacement_mode() )
>>>> +        {
>>>> +            printk(XENLOG_ERR "..no 8254 timer found - trying HPET Legacy Replacement Mode\n");
>>>> +            hpet_changed = true;
>>>> +            goto retry_ioapic_pin;
>>>> +        }
>>>> +
>>>>          clear_IO_APIC_pin(apic1, pin1);
>>>>          printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to "
>>>>                 "IO-APIC\n");
>>> As mentioned on irc already, I'm somewhat concerned by doing this change
>>> first (and also not undoing it if it didn't work). An AMD Turion based
>>> laptop I was using many years ago required one of the other fallbacks to
>>> be engaged, and hence I'd expect certain other (old?) systems to be
>>> similarly affected. Sadly (for the purposes here) I don't have this
>>> laptop anymore, so wouldn't be able to verify whether the above actually
>>> breaks there.
>> Turion is K8, so very obsolete these days.  If it doesn't have an
>> IO-APIC, its even less likely to have an HPET.
> It did have an IO-APIC, but required one of the virtual-wire modes to
> be enabled iirc.
>
>> Even if it does have an HPET, there isn't anything to suggest that
>> legacy replacement mode is broken.
> With one firmware flaw there is about as much chance for another one
> as there is for HPET to be working, I'd say. Iirc (very vaguely) it
> did have a HPET, but no ACPI table entry for it, so we wouldn't have
> used it.
>
>> Would you prefer me to undo the change?  Its not easy - we have the boot
>> time config stashed, but if it was periodic before, the accumulator is
>> broken because we can never read that value back out.
> I didn't think the accumulator change would matter. I did think though
> not having been in legacy replacement mode before might be better to
> also not be in after, if its enabling didn't help anyway.

The accumulator matters if chan0 was configured as periodic previously.

Then again, this is broken anyway generally (e.g. the S3 path), so I
suppose we're not making it any worse here.

~Andrew


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally
  2021-03-26 13:15           ` Ian Jackson
  2021-03-26 13:37             ` Jan Beulich
  2021-03-26 14:23             ` Marek Marczykowski-Górecki
@ 2021-03-29 13:30             ` Roger Pau Monné
  2 siblings, 0 replies; 27+ messages in thread
From: Roger Pau Monné @ 2021-03-29 13:30 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Tamas K Lengyel, Andrew Cooper, Xen-devel, Jan Beulich, Wei Liu,
	Marek Marczykowski-Górecki, Frédéric Pierret

On Fri, Mar 26, 2021 at 01:15:42PM +0000, Ian Jackson wrote:
> Tamas K Lengyel writes ("Re: [PATCH v1.1 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally"):
> > The problem from my perspective is that the end-users have no way to
> > determine when that boot option is needing to be set. Having an
> > installation step of "check if things explode when you reboot" is just
> > plain bad. Many times you don't even have access to a remote serial
> > console to check why Xen didn't boot. So that's not a realistic route
> > that can be taken. If Jan's patch is applied then the only thing I'll
> > be able to do is make all installations always-enable this option even
> > on systems that would have booted fine otherwise without it. It is
> > unclear if that has any downsides of its own and could very well just
> > kick the can down the road and lead to other issues.
> 
> Thanks for the clear explanation.
> 
> I think our options are:
> 
>  1. What is currently in xen.git#staging-4.15: some different set of
>     machines do not work (reliably? at all?), constituting a
>     regression on older hardware.
> 
>  2. Jan's patch, with the consequences you describe.  Constituing a
>     continued failure to properly support the newer hardware.
> 
>  3. Andy's patches which are not finished yet and are therefore high
>     risk.  Ie, delay the release.
> 
> Please let me know if you think this characterisation of the situation
> is inaccurate or misleading.
> 
> This is not a good set of options.
> 
> Of them, I still think I would choose (2).  But I would love it if
> someone were to come up with a better suggestion (perhaps a variant on
> one of the above).

As the FreeBSD Xen packager I would consider simply adding Andrew's
patches to the port under my own risk, and maybe do the same with the
vpt performance fix, but that one is riskier as an issue there could
lead to XSA-336 being re-introduced, so I need to carefully consider.
I've cherry picked patches before to fix other issues before they hit
the stable branches.

I'm still trying to go over all emails, but if 2. is the chosen route
could we describe in the release notes those issues and maybe provide
hashes for the alternative patches provided they are in unstable by
the time of the release?

That way packagers will get an option to cherry pick those fixes at
their own risk. It's not the best model, as we are just pushing a
decisions towards consumers which might not have good judgment about
the effect of those issues and the risk of the fixes, but seeing how
much controversy this has caused it's likely an option to be
considered so that we are not seen as hiding such patches from
downstreams.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-03-29 13:30 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-25 16:52 [PATCH for-4.15 0/2] x86/hpet: Try to unbreak Ryzen 1800X systems Andrew Cooper
2021-03-25 16:52 ` [PATCH 1/2] x86/hpet: Factor hpet_enable_legacy_replacement_mode() out of hpet_setup() Andrew Cooper
2021-03-26  9:59   ` Jan Beulich
2021-03-26 10:53     ` Andrew Cooper
2021-03-26 10:55       ` Jan Beulich
2021-03-25 16:52 ` [PATCH 2/2] x86/hpet: Don't enable legacy replacement mode unconditionally Andrew Cooper
2021-03-25 17:00   ` Andrew Cooper
2021-03-25 17:21   ` [PATCH v1.1 " Andrew Cooper
2021-03-26  9:51     ` Jan Beulich
2021-03-26 16:32       ` Andrew Cooper
2021-03-26 16:48         ` Jan Beulich
2021-03-26 16:53           ` Andrew Cooper
2021-03-26 12:03     ` Ian Jackson
2021-03-26 12:29       ` Ian Jackson
2021-03-26 13:03         ` Tamas K Lengyel
2021-03-26 13:15           ` Ian Jackson
2021-03-26 13:37             ` Jan Beulich
2021-03-26 14:07               ` Ian Jackson
2021-03-26 14:23             ` Marek Marczykowski-Górecki
2021-03-26 15:02               ` Tamas K Lengyel
2021-03-29 13:30             ` Roger Pau Monné
2021-03-26 13:31           ` Jan Beulich
2021-03-26 13:39             ` Ian Jackson
2021-03-26 13:30         ` Jan Beulich
2021-03-26 13:43           ` Marek Marczykowski-Górecki
2021-03-26 14:02             ` Jan Beulich
2021-03-26 13:50     ` Frédéric Pierret

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).