All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] target/ppc: cpu_init: Clean up stop state on cpu reset
@ 2022-06-14  8:29 Frederic Barrat
  2022-06-14 12:44 ` Fabiano Rosas
  2022-06-15  5:23 ` Cédric Le Goater
  0 siblings, 2 replies; 5+ messages in thread
From: Frederic Barrat @ 2022-06-14  8:29 UTC (permalink / raw)
  To: clg, danielhb413, qemu-ppc, qemu-devel

The 'resume_as_sreset' attribute of a cpu can be set when a thread is
entering a stop state on ppc books. It causes the thread to be
re-routed to vector 0x100 when woken up by an exception. So it must be
cleaned on reset or a thread might be re-routed unexpectedly after a
reset, when it was not in a stop state and/or when the appropriate
exception handler isn't set up yet.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
---

I didn't find an appropriate commit to add a "Fixes:". It originates
when adding support for power management states but the code looked
quite different in 2016 and it's not clear whether we were supporting
reset then.

target/ppc/cpu_init.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 0f891afa04..c16cb8dbe7 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -7186,6 +7186,9 @@ static void ppc_cpu_reset(DeviceState *dev)
         }
         pmu_update_summaries(env);
     }
+
+    /* clean any pending stop state */
+    env->resume_as_sreset = 0;
 #endif
     hreg_compute_hflags(env);
     env->reserve_addr = (target_ulong)-1ULL;
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] target/ppc: cpu_init: Clean up stop state on cpu reset
  2022-06-14  8:29 [PATCH] target/ppc: cpu_init: Clean up stop state on cpu reset Frederic Barrat
@ 2022-06-14 12:44 ` Fabiano Rosas
  2022-06-15  5:23 ` Cédric Le Goater
  1 sibling, 0 replies; 5+ messages in thread
From: Fabiano Rosas @ 2022-06-14 12:44 UTC (permalink / raw)
  To: Frederic Barrat, clg, danielhb413, qemu-ppc, qemu-devel

Frederic Barrat <fbarrat@linux.ibm.com> writes:

> The 'resume_as_sreset' attribute of a cpu can be set when a thread is
> entering a stop state on ppc books. It causes the thread to be
> re-routed to vector 0x100 when woken up by an exception. So it must be
> cleaned on reset or a thread might be re-routed unexpectedly after a
> reset, when it was not in a stop state and/or when the appropriate
> exception handler isn't set up yet.
>
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] target/ppc: cpu_init: Clean up stop state on cpu reset
  2022-06-14  8:29 [PATCH] target/ppc: cpu_init: Clean up stop state on cpu reset Frederic Barrat
  2022-06-14 12:44 ` Fabiano Rosas
@ 2022-06-15  5:23 ` Cédric Le Goater
  2022-06-15  7:17   ` Frederic Barrat
  1 sibling, 1 reply; 5+ messages in thread
From: Cédric Le Goater @ 2022-06-15  5:23 UTC (permalink / raw)
  To: Frederic Barrat, danielhb413, qemu-ppc, qemu-devel

On 6/14/22 10:29, Frederic Barrat wrote:
> The 'resume_as_sreset' attribute of a cpu can be set when a thread is
> entering a stop state on ppc books. It causes the thread to be
> re-routed to vector 0x100 when woken up by an exception. So it must be
> cleaned on reset or a thread might be re-routed unexpectedly after a
> reset, when it was not in a stop state and/or when the appropriate
> exception handler isn't set up yet.

What is the test scenario ? and what are the symptoms ?


> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>

Reviewed-by: Cédric Le Goater <clg@kaod.org>


> ---
> 
> I didn't find an appropriate commit to add a "Fixes:". It originates
> when adding support for power management states but the code looked
> quite different in 2016 and it's not clear whether we were supporting
> reset then.

It was added when we needed some support for the POWER8 stop states.
About that time.

Thanks,

C.

> 
> target/ppc/cpu_init.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> index 0f891afa04..c16cb8dbe7 100644
> --- a/target/ppc/cpu_init.c
> +++ b/target/ppc/cpu_init.c
> @@ -7186,6 +7186,9 @@ static void ppc_cpu_reset(DeviceState *dev)
>           }
>           pmu_update_summaries(env);
>       }
> +
> +    /* clean any pending stop state */
> +    env->resume_as_sreset = 0;
>   #endif
>       hreg_compute_hflags(env);
>       env->reserve_addr = (target_ulong)-1ULL;



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] target/ppc: cpu_init: Clean up stop state on cpu reset
  2022-06-15  5:23 ` Cédric Le Goater
@ 2022-06-15  7:17   ` Frederic Barrat
  2022-06-15  9:40     ` Cédric Le Goater
  0 siblings, 1 reply; 5+ messages in thread
From: Frederic Barrat @ 2022-06-15  7:17 UTC (permalink / raw)
  To: Cédric Le Goater, danielhb413, qemu-ppc, qemu-devel



On 15/06/2022 07:23, Cédric Le Goater wrote:
> On 6/14/22 10:29, Frederic Barrat wrote:
>> The 'resume_as_sreset' attribute of a cpu can be set when a thread is
>> entering a stop state on ppc books. It causes the thread to be
>> re-routed to vector 0x100 when woken up by an exception. So it must be
>> cleaned on reset or a thread might be re-routed unexpectedly after a
>> reset, when it was not in a stop state and/or when the appropriate
>> exception handler isn't set up yet.
> 
> What is the test scenario ? and what are the symptoms ?


I was hitting it because of another bug in skiboot: if you have many 
chips, we spend way too much time in add_opal_interrupts(), especially 
on powernv10 (I'm working on a separate patch in skiboot to fix that). 
Sufficiently so that the watchdog timer resets the system. When it 
happens, all the secondary threads are in stopped state, only the main 
thread is working. That's how I was reproducing.

What happens after the reset can vary a bit due to timing, but the most 
likely scenario is that we go through another primary thread election in 
skiboot. If the primary thread is the same as before, then there's no 
problem. If it's a different primary, then it will enter 
main_cpu_entry() while the other threads wait as secondaries. At some 
point, the primary thread (which still carries the wrong 
resume_as_sreset value from before reset) will enable the decrementer 
interrupt. The vector for the decrementer exception 0x900 is defined, so 
that shouldn't be a problem. However, because of the wrong 
resume_as_sreset value, it is re-routed to vector 0x100, which is still 
defined as the default boot-time handler, which is the entry point for 
BML. So the thread restarts as new, but this time it will be elected 
secondary. And we end up with all threads waiting as secondaries and a 
system stuck. All that happen before we've init the uart, so there's not 
a single trace on the console. Fun :-)

   Fred


> 
> 
>> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> 
> Reviewed-by: Cédric Le Goater <clg@kaod.org>
> 
> 
>> ---
>>
>> I didn't find an appropriate commit to add a "Fixes:". It originates
>> when adding support for power management states but the code looked
>> quite different in 2016 and it's not clear whether we were supporting
>> reset then.
> 
> It was added when we needed some support for the POWER8 stop states.
> About that time.
> 
> Thanks,
> 
> C.
> 
>>
>> target/ppc/cpu_init.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
>> index 0f891afa04..c16cb8dbe7 100644
>> --- a/target/ppc/cpu_init.c
>> +++ b/target/ppc/cpu_init.c
>> @@ -7186,6 +7186,9 @@ static void ppc_cpu_reset(DeviceState *dev)
>>           }
>>           pmu_update_summaries(env);
>>       }
>> +
>> +    /* clean any pending stop state */
>> +    env->resume_as_sreset = 0;
>>   #endif
>>       hreg_compute_hflags(env);
>>       env->reserve_addr = (target_ulong)-1ULL;
> 
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] target/ppc: cpu_init: Clean up stop state on cpu reset
  2022-06-15  7:17   ` Frederic Barrat
@ 2022-06-15  9:40     ` Cédric Le Goater
  0 siblings, 0 replies; 5+ messages in thread
From: Cédric Le Goater @ 2022-06-15  9:40 UTC (permalink / raw)
  To: Frederic Barrat, danielhb413, qemu-ppc, qemu-devel

On 6/15/22 09:17, Frederic Barrat wrote:
> 
> 
> On 15/06/2022 07:23, Cédric Le Goater wrote:
>> On 6/14/22 10:29, Frederic Barrat wrote:
>>> The 'resume_as_sreset' attribute of a cpu can be set when a thread is
>>> entering a stop state on ppc books. It causes the thread to be
>>> re-routed to vector 0x100 when woken up by an exception. So it must be
>>> cleaned on reset or a thread might be re-routed unexpectedly after a
>>> reset, when it was not in a stop state and/or when the appropriate
>>> exception handler isn't set up yet.
>>
>> What is the test scenario ? and what are the symptoms ?
> 
> 
> I was hitting it because of another bug in skiboot: if you have many chips, we spend way too much time in add_opal_interrupts(), especially on powernv10 (I'm working on a separate patch in skiboot to fix that). Sufficiently so that the watchdog timer resets the system. When it happens, all the secondary threads are in stopped state, only the main thread is working. That's how I was reproducing.
> 
> What happens after the reset can vary a bit due to timing, but the most likely scenario is that we go through another primary thread election in skiboot. If the primary thread is the same as before, then there's no problem. If it's a different primary, then it will enter main_cpu_entry() while the other threads wait as secondaries. At some point, the primary thread (which still carries the wrong resume_as_sreset value from before reset) will enable the decrementer interrupt. The vector for the decrementer exception 0x900 is defined, so that shouldn't be a problem. However, because of the wrong resume_as_sreset value, it is re-routed to vector 0x100, which is still defined as the default boot-time handler, which is the entry point for BML. So the thread restarts as new, but this time it will be elected secondary. And we end up with all threads waiting as secondaries and a system stuck. All that happen before we've init the uart, so there's not a single trace on the console. 
> Fun :-)

Great analysis !

I think this deserve a v2 just to put in the commit log what you
just wrote :)

Thanks,

C.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-06-15  9:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-14  8:29 [PATCH] target/ppc: cpu_init: Clean up stop state on cpu reset Frederic Barrat
2022-06-14 12:44 ` Fabiano Rosas
2022-06-15  5:23 ` Cédric Le Goater
2022-06-15  7:17   ` Frederic Barrat
2022-06-15  9:40     ` Cédric Le Goater

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.