All of lore.kernel.org
 help / color / mirror / Atom feed
* KVM: PPC: KVM_BOOK3S_32 hard lockup on G4 Mac Mini
@ 2019-01-14 18:40 Mark Cave-Ayland
  2019-01-22 18:27 ` Mark Cave-Ayland
  0 siblings, 1 reply; 2+ messages in thread
From: Mark Cave-Ayland @ 2019-01-14 18:40 UTC (permalink / raw)
  To: kvm-ppc

Hi all,

Over the holidays I've been experimenting with the new Debian powerpc port to see if
it will allow me to test some qemu-system-ppc changes on a second-hand G4 Mac Mini
and unfortunately I am experiencing hard lockups when trying to boot an OS X ISO in
the guest with KVM enabled.

Debian ports is currently using 4.19 and as the default config didn't enable KVM by
default, I grabbed a vanilla 4.19.13 source from kernel.org and compiled it myself to
make sure that no custom Debian patches could potentially cause any issues.

When booting an OS X 10.2 ISO the guest starts normally until the point where the
Apple logo disappears and the blue desktop background appears on screen, at which
point the physical console locks up hard. With a background SSH session running I was
able to grab the following from the system logs:


[  977.269291] qemu-system-ppc[1972]: floating point exception (8) at aab8bce4 nip
aab8bce4 lr aab8bca4 code e in libc-2.28.so[aaa89000+1cd000]
[  977.269323] qemu-system-ppc[1972]: code: 80c1001c 81228fd0 7c7d1b78 7fe3fb78
2f890000 419e0018 39200031 7c09071d
[  977.269338] qemu-system-ppc[1972]: code: 7d200026 55291ffe 69290001 44000002
<7c000026> 74091000 7c7f1b78 40820034
[  977.274455] in:imklog[277]: floating point exception (8) at 3c7e74 nip 3c7e74 lr
3c7e3c code e in libpthread-2.28.so[3b1000+25000]
[  977.274588] in:imklog[277]: code: 38000003 81228fd0 7c7b1b78 7fa3eb78 2f890000
419e0018 39200031 7c09071d
[  977.274602] in:imklog[277]: code: 7d200026 55291ffe 69290001 44000002 <7c000026>
74091000 7c7f1b78 40820034
[  977.274675] systemd-journal[169]: floating point exception (8) at 56cb30 nip
56cb30 lr 56caf4 code e in libc-2.28.so[45b000+1cd000]
[  977.274687] systemd-journal[169]: code: 380000ee 81228fd0 7c7a1b78 7f83e378
2f890000 419e0018 39200031 7c09071d
[  977.274701] systemd-journal[169]: code: 7d200026 55291ffe 69290001 44000002
<7c000026> 74091000 7c7f1b78 40820038
[  977.276028] systemd[1]: floating point exception (8) at 84aa84 nip 84aa84 lr
84aa4c code e in libc-2.28.so[739000+1cd000]
[  977.276053] systemd[1]: code: 7fc802a6 93e1002c 3fde000e 3bdee5a8 81228b30
2f890000 409e0064 81228fd0
[  977.276067] systemd[1]: code: 380000ee 2f890000 409e0034 44000002 <7c000026>
74091000 7c7f1b78 408200d0
[  977.276096] systemd[1]: floating point exception (8) at a1f2e0 nip a1f2e0 lr
1003f4 code e in systemd[93d000+141000]
[  977.276105] systemd[1]: code: 48009861 4bfffedc 80fe8004 811e83f8 7fe3fb78
38c0008b 80be8008 38800000
[  977.276119] systemd[1]: code: 38e70598 4cc63182 48009839 4bffff30 <9421fe90>
7c0802a6 429f0005 9361015c
[  977.276137] systemd[1]: floating point exception (8) at a1f2e0 nip a1f2e0 lr
1003f4 code e in systemd[93d000+141000]
[  977.276143] systemd[1]: code: 48009861 4bfffedc 80fe8004 811e83f8 7fe3fb78
38c0008b 80be8008 38800000
[  977.276155] systemd[1]: code: 38e70598 4cc63182 48009839 4bffff30 <9421fe90>
7c0802a6 429f0005 9361015c
[  977.276172] systemd[1]: floating point exception (8) at a1f2e0 nip a1f2e0 lr
1003f4 code e in systemd[93d000+141000]
[  977.276179] systemd[1]: code: 48009861 4bfffedc 80fe8004 811e83f8 7fe3fb78
38c0008b 80be8008 38800000
[  977.276191] systemd[1]: code: 38e70598 4cc63182 48009839 4bffff30 <9421fe90>
7c0802a6 429f0005 9361015c
[  977.276209] systemd[1]: floating point exception (8) at a1f2e0 nip a1f2e0 lr
1003f4 code e in systemd[93d000+141000]
[  977.276215] systemd[1]: code: 48009861 4bfffedc 80fe8004 811e83f8 7fe3fb78
38c0008b 80be8008 38800000
[  977.276227] systemd[1]: code: 38e70598 4cc63182 48009839 4bffff30 <9421fe90>
7c0802a6 429f0005 9361015c
[  977.276245] systemd[1]: floating point exception (8) at a1f2e0 nip a1f2e0 lr
1003f4 code e in systemd[93d000+141000]
[  977.276251] systemd[1]: code: 48009861 4bfffedc 80fe8004 811e83f8 7fe3fb78
38c0008b 80be8008 38800000
[  977.276263] systemd[1]: code: 38e70598 4cc63182 48009839 4bffff30 <9421fe90>
7c0802a6 429f0005 9361015c
[  977.276281] systemd[1]: floating point exception (8) at a1f2e0 nip a1f2e0 lr
1003f4 code e in systemd[93d000+141000]
[  977.276287] systemd[1]: code: 48009861 4bfffedc 80fe8004 811e83f8 7fe3fb78
38c0008b 80be8008 38800000
[  977.276299] systemd[1]: code: 38e70598 4cc63182 48009839 4bffff30 <9421fe90>
7c0802a6 429f0005 9361015c
[  978.305332] systemd: 35 output lines suppressed due to ratelimiting


Figuring that this might be a regression, I started to work my way backwards through
various source tarballs using the Debian config as a starting point to find out when
this last worked, and came up with the results below:

4.4.169
- last known working version, OS X boots to the installer

4.5.7 | 4.6.7 | 4.7.9
- all build, but loading initrd modules fails on reboot with multiple "module_32: XXX
unknown ADD relocation YYYY" messages

4.8.17
- first known bad version which locks up


So firstly: can anyone think of any changes between 4.4 - 4.8 that might be
responsible for the errors that appear in the logs above? That would certainly save
quite a bit of time bisecting this on real hardware.

And secondly: if not, can someone point me towards a solution for the "module_32: XXX
unknown ADD relocation YYYY" messages so I can manually patch the sources between
4.4.169 and 4.8.17 so I can hopefully come back to the list with a more accurate
bisection?


Many thanks,

Mark.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: KVM: PPC: KVM_BOOK3S_32 hard lockup on G4 Mac Mini
  2019-01-14 18:40 KVM: PPC: KVM_BOOK3S_32 hard lockup on G4 Mac Mini Mark Cave-Ayland
@ 2019-01-22 18:27 ` Mark Cave-Ayland
  0 siblings, 0 replies; 2+ messages in thread
From: Mark Cave-Ayland @ 2019-01-22 18:27 UTC (permalink / raw)
  To: kvm-ppc

On 14/01/2019 18:40, Mark Cave-Ayland wrote:

> When booting an OS X 10.2 ISO the guest starts normally until the point where the
> Apple logo disappears and the blue desktop background appears on screen, at which
> point the physical console locks up hard. With a background SSH session running I was
> able to grab the following from the system logs:
> 
> 
> [  977.269291] qemu-system-ppc[1972]: floating point exception (8) at aab8bce4 nip
> aab8bce4 lr aab8bca4 code e in libc-2.28.so[aaa89000+1cd000]
> [  977.269323] qemu-system-ppc[1972]: code: 80c1001c 81228fd0 7c7d1b78 7fe3fb78
> 2f890000 419e0018 39200031 7c09071d
> [  977.269338] qemu-system-ppc[1972]: code: 7d200026 55291ffe 69290001 44000002
> <7c000026> 74091000 7c7f1b78 40820034
> [  977.274455] in:imklog[277]: floating point exception (8) at 3c7e74 nip 3c7e74 lr
> 3c7e3c code e in libpthread-2.28.so[3b1000+25000]
> [  977.274588] in:imklog[277]: code: 38000003 81228fd0 7c7b1b78 7fa3eb78 2f890000
> 419e0018 39200031 7c09071d
> [  977.274602] in:imklog[277]: code: 7d200026 55291ffe 69290001 44000002 <7c000026>
> 74091000 7c7f1b78 40820034
> [  977.274675] systemd-journal[169]: floating point exception (8) at 56cb30 nip
> 56cb30 lr 56caf4 code e in libc-2.28.so[45b000+1cd000]
> [  977.274687] systemd-journal[169]: code: 380000ee 81228fd0 7c7a1b78 7f83e378
> 2f890000 419e0018 39200031 7c09071d
> [  977.274701] systemd-journal[169]: code: 7d200026 55291ffe 69290001 44000002
> <7c000026> 74091000 7c7f1b78 40820038
> [  977.276028] systemd[1]: floating point exception (8) at 84aa84 nip 84aa84 lr
> 84aa4c code e in libc-2.28.so[739000+1cd000]
> [  977.276053] systemd[1]: code: 7fc802a6 93e1002c 3fde000e 3bdee5a8 81228b30
> 2f890000 409e0064 81228fd0
> [  977.276067] systemd[1]: code: 380000ee 2f890000 409e0034 44000002 <7c000026>
> 74091000 7c7f1b78 408200d0
> [  977.276096] systemd[1]: floating point exception (8) at a1f2e0 nip a1f2e0 lr
> 1003f4 code e in systemd[93d000+141000]
> [  977.276105] systemd[1]: code: 48009861 4bfffedc 80fe8004 811e83f8 7fe3fb78
> 38c0008b 80be8008 38800000
> [  977.276119] systemd[1]: code: 38e70598 4cc63182 48009839 4bffff30 <9421fe90>
> 7c0802a6 429f0005 9361015c
> [  977.276137] systemd[1]: floating point exception (8) at a1f2e0 nip a1f2e0 lr
> 1003f4 code e in systemd[93d000+141000]
> [  977.276143] systemd[1]: code: 48009861 4bfffedc 80fe8004 811e83f8 7fe3fb78
> 38c0008b 80be8008 38800000
> [  977.276155] systemd[1]: code: 38e70598 4cc63182 48009839 4bffff30 <9421fe90>
> 7c0802a6 429f0005 9361015c
> [  977.276172] systemd[1]: floating point exception (8) at a1f2e0 nip a1f2e0 lr
> 1003f4 code e in systemd[93d000+141000]
> [  977.276179] systemd[1]: code: 48009861 4bfffedc 80fe8004 811e83f8 7fe3fb78
> 38c0008b 80be8008 38800000
> [  977.276191] systemd[1]: code: 38e70598 4cc63182 48009839 4bffff30 <9421fe90>
> 7c0802a6 429f0005 9361015c
> [  977.276209] systemd[1]: floating point exception (8) at a1f2e0 nip a1f2e0 lr
> 1003f4 code e in systemd[93d000+141000]
> [  977.276215] systemd[1]: code: 48009861 4bfffedc 80fe8004 811e83f8 7fe3fb78
> 38c0008b 80be8008 38800000
> [  977.276227] systemd[1]: code: 38e70598 4cc63182 48009839 4bffff30 <9421fe90>
> 7c0802a6 429f0005 9361015c
> [  977.276245] systemd[1]: floating point exception (8) at a1f2e0 nip a1f2e0 lr
> 1003f4 code e in systemd[93d000+141000]
> [  977.276251] systemd[1]: code: 48009861 4bfffedc 80fe8004 811e83f8 7fe3fb78
> 38c0008b 80be8008 38800000
> [  977.276263] systemd[1]: code: 38e70598 4cc63182 48009839 4bffff30 <9421fe90>
> 7c0802a6 429f0005 9361015c
> [  977.276281] systemd[1]: floating point exception (8) at a1f2e0 nip a1f2e0 lr
> 1003f4 code e in systemd[93d000+141000]
> [  977.276287] systemd[1]: code: 48009861 4bfffedc 80fe8004 811e83f8 7fe3fb78
> 38c0008b 80be8008 38800000
> [  977.276299] systemd[1]: code: 38e70598 4cc63182 48009839 4bffff30 <9421fe90>
> 7c0802a6 429f0005 9361015c
> [  978.305332] systemd: 35 output lines suppressed due to ratelimiting

After a bit of digging, I managed to figure out that the "module_32: XXX
unknown ADD relocation YYYY" messages occurred because the -fno-PIE flag was
incorrectly being omitted from the powerpc kernel build around the time where the
lockup bug was introduced.

Manually fixing up the Makefile for each build enabled me to run a git bisect over
the past week, which has now finished and indicates that the first bad commit which
causes the hard lockup is:


$ git bisect bad
8792468da5e12e77e76e1edf081acf0392abb331 is the first bad commit
commit 8792468da5e12e77e76e1edf081acf0392abb331
Author: Cyril Bur <cyrilbur@gmail.com>
Date:   Mon Feb 29 17:53:49 2016 +1100

    powerpc: Add the ability to save FPU without giving it up

    This patch adds the ability to be able to save the FPU registers to the
    thread struct without giving up (disabling the facility) next time the
    process returns to userspace.

    This patch optimises the thread copy path (as a result of a fork() or
    clone()) so that the parent thread can return to userspace with hot
    registers avoiding a possibly pointless reload of FPU register state.

    Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

:040000 040000 c7b445da4614daa38172ad5ef9e92e9028ee503f
3e56b9aee12a292f386a37c99d45ca07c968125b M      arch


So it seems this has been broken on 32-bit KVM PR for some time. I had a quick browse
of the diff and I'm wondering if the patch is also missing a corresponding update to
kvmppc_giveup_ext() in book3s_pr.c somehow?


ATB,

Mark.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-01-22 18:27 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-14 18:40 KVM: PPC: KVM_BOOK3S_32 hard lockup on G4 Mac Mini Mark Cave-Ayland
2019-01-22 18:27 ` Mark Cave-Ayland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.