linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* coherency issue observed after hotplug on POWER8
@ 2021-09-21 21:03 Thadeu Lima de Souza Cascardo
  2021-09-24 17:17 ` Naveen N. Rao
  0 siblings, 1 reply; 3+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2021-09-21 21:03 UTC (permalink / raw)
  To: linuxppc-dev

Hi, there.

We have been investigating an issue we have observed on POWER8 POWERNV systems.
When running the kernel selftests reuseport_bpf_cpu after a CPU hotplug, we see
crashes, in different forms. [1]

I managed to get xmon on that trap, and did some debugging. [2] I tried to dump
the BPF JIT code, and it looks different when dumped from CPU#0 and CPU#0x9f
(the one that was hotplugged, offlined, then onlined).

Here is my partial analysis [3]. Basically, the BPF JIT fills a page with
invalid instructions (traps, in ppc64 case), and puts the BPF program in a
random offset of the page. In the case of the hotplugged CPU, which was the one
that compiled the program, the page had the expected contents (BPF program
started at the offset used to run the program). On the other CPU (in many
cases, CPU #0), the same memory address/page had different contents, with the
program starting at a different offset.

Is this a case of a bug in the micro-architecture or the firmware when doing
the hotplug? Can someone chime in?

Notice that we can't reproduce the same issue on a POWER9 system.

Thanks.
Cascardo.

[1] https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1927076
[2] https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1927076/comments/29
[3] https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1927076/comments/30

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: coherency issue observed after hotplug on POWER8
  2021-09-21 21:03 coherency issue observed after hotplug on POWER8 Thadeu Lima de Souza Cascardo
@ 2021-09-24 17:17 ` Naveen N. Rao
  2021-10-21 15:01   ` Krzysztof Kozlowski
  0 siblings, 1 reply; 3+ messages in thread
From: Naveen N. Rao @ 2021-09-24 17:17 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo, linuxppc-dev

Hi Cascardo,
Thanks for reporting this.


Thadeu Lima de Souza Cascardo wrote:
> Hi, there.
> 
> We have been investigating an issue we have observed on POWER8 POWERNV systems.
> When running the kernel selftests reuseport_bpf_cpu after a CPU hotplug, we see
> crashes, in different forms. [1]

Just to re-confirm: you are only seeing this on P8 powernv, and not in a 
P8 guest/LPAR? I haven't been able to reproduce this on a firestone -- 
can you share more details about your power8 machine?

Also, do you only see this with ubuntu kernels, or are you also able to 
reproduce this with the upstream tree?

> 
> I managed to get xmon on that trap, and did some debugging. [2] I tried to dump
> the BPF JIT code, and it looks different when dumped from CPU#0 and CPU#0x9f
> (the one that was hotplugged, offlined, then onlined).

Next time you reproduce this, can you try dumping the SLBs for the cpus 
(command 'u' in xmon)?

> 
> Here is my partial analysis [3]. Basically, the BPF JIT fills a page with
> invalid instructions (traps, in ppc64 case), and puts the BPF program in a
> random offset of the page. In the case of the hotplugged CPU, which was the one
> that compiled the program, the page had the expected contents (BPF program
> started at the offset used to run the program). On the other CPU (in many
> cases, CPU #0), the same memory address/page had different contents, with the
> program starting at a different offset.

From [3], I think fp->aux->jit_data can be NULL if there are subprogs.  
But, I find it interesting that you don't always see the correct 
bpf_func, as reported in comment #25. Can you also try dumping the full 
bpf_prog structure (prog/fp) from xmon?

> 
> Is this a case of a bug in the micro-architecture or the firmware when 
> doing the hotplug? Can someone chime in?

It's possible that something is going wrong when offlining the cpu. Can 
you try booting the kernel with 'powersave=off' and see if the problem 
goes away?

> 
> Notice that we can't reproduce the same issue on a POWER9 system.
> 
> Thanks.
> Cascardo.
> 
> [1] https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1927076
> [2] https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1927076/comments/29
> [3] https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1927076/comments/30
> 

- Naveen


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: coherency issue observed after hotplug on POWER8
  2021-09-24 17:17 ` Naveen N. Rao
@ 2021-10-21 15:01   ` Krzysztof Kozlowski
  0 siblings, 0 replies; 3+ messages in thread
From: Krzysztof Kozlowski @ 2021-10-21 15:01 UTC (permalink / raw)
  To: Naveen N. Rao, Thadeu Lima de Souza Cascardo, linuxppc-dev

On 24/09/2021 19:17, Naveen N. Rao wrote:
> Hi Cascardo,
> Thanks for reporting this.
> 
> 
> Thadeu Lima de Souza Cascardo wrote:
>> Hi, there.
>>
>> We have been investigating an issue we have observed on POWER8 POWERNV systems.
>> When running the kernel selftests reuseport_bpf_cpu after a CPU hotplug, we see
>> crashes, in different forms. [1]
> 
> Just to re-confirm: you are only seeing this on P8 powernv, and not in a 
> P8 guest/LPAR? I haven't been able to reproduce this on a firestone -- 
> can you share more details about your power8 machine?
> 
> Also, do you only see this with ubuntu kernels, or are you also able to 
> reproduce this with the upstream tree?

Let me just covert this part of your email:

Upstream trees (5.11, 5.13, 5.14). See also:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1927076/comments/28

I could not reproduce it on Power8 LPAR. Neither on Power9 QEMU guest.

Reproduced on few machines:
IBM, POWER8NVL, 8335-GTB
POWER8, 8001-22C and 8335-GTA

lspcpu for the last one:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1927076/comments/15


Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-10-21 15:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-21 21:03 coherency issue observed after hotplug on POWER8 Thadeu Lima de Souza Cascardo
2021-09-24 17:17 ` Naveen N. Rao
2021-10-21 15:01   ` Krzysztof Kozlowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).