All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] target/ppc: cpu_init: Clean up stop state on cpu reset
@ 2022-06-17  9:52 Frederic Barrat
  2022-06-17 14:57 ` Fabiano Rosas
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Frederic Barrat @ 2022-06-17  9:52 UTC (permalink / raw)
  To: clg, danielhb413, qemu-ppc, qemu-devel

The 'resume_as_sreset' attribute of a cpu is set when a thread is
entering a stop state on ppc books. It causes the thread to be
re-routed to vector 0x100 when woken up by an exception. So it must be
cleared on reset or a thread might be re-routed unexpectedly after a
reset, when it was not in a stop state and/or when the appropriate
exception handler isn't set up yet.

Using skiboot, it can be tested by resetting the system when it is
quiet and most threads are idle and in stop state.

After the reset occurs, skiboot elects a primary thread and all the
others wait in secondary_wait. The primary thread does all the system
initialization from main_cpu_entry() and at some point, the
decrementer interrupt starts ticking. The exception vector for the
decrementer interrupt is in place, so that shouldn't be a
problem. However, if that primary thread was in stop state prior to
the reset, and because the resume_as_sreset parameters is still set,
it is re-routed to exception vector 0x100. Which, at that time, is
still defined as the entry point for BML. So that primary thread
restarts as new and ends up being treated like any other secondary
thread. All threads are now waiting in secondary_wait.

It results in a full system hang with no message on the console, as
the uart hasn't been init'ed yet. It's actually not obvious to realise
what's happening if not tracing reset (-d cpu_reset). The fix is
simply to clear the 'resume_as_sreset' attribute on reset.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
---
Changelog:
v2: rework commit message


 target/ppc/cpu_init.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 0f891afa04..c16cb8dbe7 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -7186,6 +7186,9 @@ static void ppc_cpu_reset(DeviceState *dev)
         }
         pmu_update_summaries(env);
     }
+
+    /* clean any pending stop state */
+    env->resume_as_sreset = 0;
 #endif
     hreg_compute_hflags(env);
     env->reserve_addr = (target_ulong)-1ULL;
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] target/ppc: cpu_init: Clean up stop state on cpu reset
  2022-06-17  9:52 [PATCH v2] target/ppc: cpu_init: Clean up stop state on cpu reset Frederic Barrat
@ 2022-06-17 14:57 ` Fabiano Rosas
  2022-06-17 19:45 ` Daniel Henrique Barboza
  2022-06-18 11:35 ` Cédric Le Goater
  2 siblings, 0 replies; 4+ messages in thread
From: Fabiano Rosas @ 2022-06-17 14:57 UTC (permalink / raw)
  To: Frederic Barrat, clg, danielhb413, qemu-ppc, qemu-devel

Frederic Barrat <fbarrat@linux.ibm.com> writes:

> The 'resume_as_sreset' attribute of a cpu is set when a thread is
> entering a stop state on ppc books. It causes the thread to be
> re-routed to vector 0x100 when woken up by an exception. So it must be
> cleared on reset or a thread might be re-routed unexpectedly after a
> reset, when it was not in a stop state and/or when the appropriate
> exception handler isn't set up yet.
>
> Using skiboot, it can be tested by resetting the system when it is
> quiet and most threads are idle and in stop state.
>
> After the reset occurs, skiboot elects a primary thread and all the
> others wait in secondary_wait. The primary thread does all the system
> initialization from main_cpu_entry() and at some point, the
> decrementer interrupt starts ticking. The exception vector for the
> decrementer interrupt is in place, so that shouldn't be a
> problem. However, if that primary thread was in stop state prior to
> the reset, and because the resume_as_sreset parameters is still set,
> it is re-routed to exception vector 0x100. Which, at that time, is
> still defined as the entry point for BML. So that primary thread
> restarts as new and ends up being treated like any other secondary
> thread. All threads are now waiting in secondary_wait.
>
> It results in a full system hang with no message on the console, as
> the uart hasn't been init'ed yet. It's actually not obvious to realise
> what's happening if not tracing reset (-d cpu_reset). The fix is
> simply to clear the 'resume_as_sreset' attribute on reset.
>
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] target/ppc: cpu_init: Clean up stop state on cpu reset
  2022-06-17  9:52 [PATCH v2] target/ppc: cpu_init: Clean up stop state on cpu reset Frederic Barrat
  2022-06-17 14:57 ` Fabiano Rosas
@ 2022-06-17 19:45 ` Daniel Henrique Barboza
  2022-06-18 11:35 ` Cédric Le Goater
  2 siblings, 0 replies; 4+ messages in thread
From: Daniel Henrique Barboza @ 2022-06-17 19:45 UTC (permalink / raw)
  To: Frederic Barrat, clg, qemu-ppc, qemu-devel

Queued in gitlab.com/danielhb/qemu/tree/ppc-next after adding
Cedric's R-b from v1.


Thanks,

Daniel


On 6/17/22 06:52, Frederic Barrat wrote:
> The 'resume_as_sreset' attribute of a cpu is set when a thread is
> entering a stop state on ppc books. It causes the thread to be
> re-routed to vector 0x100 when woken up by an exception. So it must be
> cleared on reset or a thread might be re-routed unexpectedly after a
> reset, when it was not in a stop state and/or when the appropriate
> exception handler isn't set up yet.
> 
> Using skiboot, it can be tested by resetting the system when it is
> quiet and most threads are idle and in stop state.
> 
> After the reset occurs, skiboot elects a primary thread and all the
> others wait in secondary_wait. The primary thread does all the system
> initialization from main_cpu_entry() and at some point, the
> decrementer interrupt starts ticking. The exception vector for the
> decrementer interrupt is in place, so that shouldn't be a
> problem. However, if that primary thread was in stop state prior to
> the reset, and because the resume_as_sreset parameters is still set,
> it is re-routed to exception vector 0x100. Which, at that time, is
> still defined as the entry point for BML. So that primary thread
> restarts as new and ends up being treated like any other secondary
> thread. All threads are now waiting in secondary_wait.
> 
> It results in a full system hang with no message on the console, as
> the uart hasn't been init'ed yet. It's actually not obvious to realise
> what's happening if not tracing reset (-d cpu_reset). The fix is
> simply to clear the 'resume_as_sreset' attribute on reset.
> 
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> ---
> Changelog:
> v2: rework commit message
> 
> 
>   target/ppc/cpu_init.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> index 0f891afa04..c16cb8dbe7 100644
> --- a/target/ppc/cpu_init.c
> +++ b/target/ppc/cpu_init.c
> @@ -7186,6 +7186,9 @@ static void ppc_cpu_reset(DeviceState *dev)
>           }
>           pmu_update_summaries(env);
>       }
> +
> +    /* clean any pending stop state */
> +    env->resume_as_sreset = 0;
>   #endif
>       hreg_compute_hflags(env);
>       env->reserve_addr = (target_ulong)-1ULL;


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] target/ppc: cpu_init: Clean up stop state on cpu reset
  2022-06-17  9:52 [PATCH v2] target/ppc: cpu_init: Clean up stop state on cpu reset Frederic Barrat
  2022-06-17 14:57 ` Fabiano Rosas
  2022-06-17 19:45 ` Daniel Henrique Barboza
@ 2022-06-18 11:35 ` Cédric Le Goater
  2 siblings, 0 replies; 4+ messages in thread
From: Cédric Le Goater @ 2022-06-18 11:35 UTC (permalink / raw)
  To: Frederic Barrat, danielhb413, qemu-ppc, qemu-devel

On 6/17/22 11:52, Frederic Barrat wrote:
> The 'resume_as_sreset' attribute of a cpu is set when a thread is
> entering a stop state on ppc books. It causes the thread to be
> re-routed to vector 0x100 when woken up by an exception. So it must be
> cleared on reset or a thread might be re-routed unexpectedly after a
> reset, when it was not in a stop state and/or when the appropriate
> exception handler isn't set up yet.
> 
> Using skiboot, it can be tested by resetting the system when it is
> quiet and most threads are idle and in stop state.
> 
> After the reset occurs, skiboot elects a primary thread and all the
> others wait in secondary_wait. The primary thread does all the system
> initialization from main_cpu_entry() and at some point, the
> decrementer interrupt starts ticking. The exception vector for the
> decrementer interrupt is in place, so that shouldn't be a
> problem. However, if that primary thread was in stop state prior to
> the reset, and because the resume_as_sreset parameters is still set,
> it is re-routed to exception vector 0x100. Which, at that time, is
> still defined as the entry point for BML. So that primary thread
> restarts as new and ends up being treated like any other secondary
> thread. All threads are now waiting in secondary_wait.
> 
> It results in a full system hang with no message on the console, as
> the uart hasn't been init'ed yet. It's actually not obvious to realise
> what's happening if not tracing reset (-d cpu_reset). The fix is
> simply to clear the 'resume_as_sreset' attribute on reset.
> 
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> ---
> Changelog:
> v2: rework commit message


Nice ! This has been a long standing bug. I chased it for weeks.
I was reproducing with intensive I/Os, doing an scp on an emulated
PowerNV machine. It hung after a while (unless using powersave=off)

Now, with this patch, a QEMU PowerNV POWER9 machine (SMP) running a
Linux 5.18 sustains the load :

   $ scp ./ubuntu-22.04-ppc64le.qcow2 root@vm103:/dev/null
   root@vm103's password:
   ubuntu-22.04-ppc64le.qcow2                    100% 8581MB   5.8MB/s   24:39

Quite a few interrupts :

   # grep PNV-PCI-MSI  /proc/interrupts
    51:          9          0  PNV-PCI-MSI 403177472 Edge      nvme0q0
    52:          2          0  PNV-PCI-MSI 403177473 Edge      nvme0q1
    53:          0          0  PNV-PCI-MSI 403177474 Edge      nvme0q2
    54:    3427556          0  PNV-PCI-MSI 135315456 Edge      eth0-rx-0
    55:          0    4261742  PNV-PCI-MSI 135315457 Edge      eth0-tx-0
    56:          1          0  PNV-PCI-MSI 135315458 Edge      eth0
    57:          0         71  PNV-PCI-MSI 135299072 Edge      xhci_hcd
    58:          0          0  PNV-PCI-MSI 135299073 Edge      xhci_hcd
    59:          0          0  PNV-PCI-MSI 135299074 Edge      xhci_hcd


It would be nice to explain what you did to corner the issue. It would
help other people chasing similar bugs in QEMU or in the kernel.

Thanks,

C.





> 
> 
>   target/ppc/cpu_init.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> index 0f891afa04..c16cb8dbe7 100644
> --- a/target/ppc/cpu_init.c
> +++ b/target/ppc/cpu_init.c
> @@ -7186,6 +7186,9 @@ static void ppc_cpu_reset(DeviceState *dev)
>           }
>           pmu_update_summaries(env);
>       }
> +
> +    /* clean any pending stop state */
> +    env->resume_as_sreset = 0;
>   #endif
>       hreg_compute_hflags(env);
>       env->reserve_addr = (target_ulong)-1ULL;



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-06-18 11:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-17  9:52 [PATCH v2] target/ppc: cpu_init: Clean up stop state on cpu reset Frederic Barrat
2022-06-17 14:57 ` Fabiano Rosas
2022-06-17 19:45 ` Daniel Henrique Barboza
2022-06-18 11:35 ` Cédric Le Goater

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.