All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Matheus K. Ferst" <matheus.ferst@eldorado.org.br>
To: "Cédric Le Goater" <clg@kaod.org>,
	qemu-devel@nongnu.org, qemu-ppc@nongnu.org
Cc: danielhb413@gmail.com, david@gibson.dropbear.id.au,
	groug@kaod.org, fbarrat@linux.ibm.com, alex.bennee@linaro.org,
	farosas@linux.ibm.com
Subject: Re: [RFC PATCH v2 00/29] PowerPC interrupt rework
Date: Mon, 3 Oct 2022 12:45:39 -0300	[thread overview]
Message-ID: <03ad8964-a7c1-5b26-00aa-3b028296e0d0@eldorado.org.br> (raw)
In-Reply-To: <9b310cf0-6140-a397-0f7d-a752b1ba4072@kaod.org>

On 28/09/2022 14:31, Cédric Le Goater wrote:
> Hello Matheus,
> 
> On 9/27/22 22:15, Matheus Ferst wrote:
>> Link to v1: 
>> https://lists.gnu.org/archive/html/qemu-ppc/2022-08/msg00370.html
>> This series is also available as a git branch: 
>> https://github.com/PPC64/qemu/tree/ferst-interrupt-fix-v2
> 
> This is impressive work on QEMU PPC.
> 
>> This version addresses Fabiano's feedback and fixes some issues found
>> with the tests suggested by Cédric. While working on it, I found two
>> intermittent problems on master:
>>
>>   i) ~10% of boots with pSeries and 970/970mp/POWER5+ hard lockup after
> 
> These CPUs never got real attention with KVM. The FW was even broken
> before 7.0.
> 
>>      either SCSI or network initialization when using -smp 4. With
>>      -smp 2, the problem is harder to reproduce but still happens, and I
>>      couldn't reproduce with thread=single.
>> ii) ~52% of KVM guest initializations on PowerNV hang in different parts
>>      of the boot process when using more than one CPU.
> 
> Do you mean when the guest is SMP or the host ?

I should've added more details, this percentage was testing powernv9 
with "-smp 4" and a pSeries-POWER9 guest with "-smp 4", but I can also 
reproduce with a multithread L0 and single thread L1. The firmware is 
printing messages like:

Could not set special wakeup on 0:1: timeout waiting for SPECIAL_WKUP_DONE.

when it hangs, but I also have this message on some successful boots.

> 
>> With the complete series applied, I couldn't reproduce (i) anymore,
> 
> Super ! Models are getting better. This is nice for the 970.
> 
>> and (ii) became a little more frequent (~58%).
> 
> Have you checked 'info pic' ? XIVE is in charge of vCPU scheduling.

I don't have much knowledge in this area yet, so I don't know what to 
look for, but if it's useful, here is the output of the command when the 
problem occurs with a 4 core L0 and a single core L1:

(qemu) info pic
info pic
CPU[0000]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE PIPR  W2
CPU[0000]: USER    00   00  00    00   00  00  00   00  00000000
CPU[0000]:   OS    00   00  00    ff   ff  00  ff   ff  00000000
CPU[0000]: POOL    00   00  00    ff   00  00  00   00  00000000
CPU[0000]: PHYS    00   ff  00    00   00  00  00   ff  80000000
CPU[0001]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE PIPR  W2
CPU[0001]: USER    00   00  00    00   00  00  00   00  00000000
CPU[0001]:   OS    00   00  00    ff   ff  00  ff   ff  00000000
CPU[0001]: POOL    00   00  00    ff   00  00  00   00  00000001
CPU[0001]: PHYS    00   ff  00    00   00  00  00   ff  80000000
CPU[0002]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE PIPR  W2
CPU[0002]: USER    00   00  00    00   00  00  00   00  00000000
CPU[0002]:   OS    00   00  00    ff   ff  00  ff   ff  00000000
CPU[0002]: POOL    00   00  00    ff   00  00  00   00  00000002
CPU[0002]: PHYS    00   ff  00    00   00  00  00   ff  80000000
CPU[0003]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE PIPR  W2
CPU[0003]: USER    00   00  00    00   00  00  00   00  00000000
CPU[0003]:   OS    00   ff  00    00   ff  00  ff   ff  00000004
CPU[0003]: POOL    00   00  00    ff   00  00  00   00  00000003
CPU[0003]: PHYS    00   ff  00    00   00  00  00   ff  80000000
XIVE[0] #0 Source 00000000 .. 000fffff
   00000014 MSI --
   00000015 MSI --
   00000016 MSI --
   00000017 MSI --
   00000018 MSI --
   00000019 MSI --
   0000001a MSI --
   0000001b MSI --
   0000001e MSI P-
   00000023 MSI --
   00000024 MSI --
   00000025 MSI --
   00000026 MSI --
XIVE[0] #0 EAT 00000000 .. 000fffff
   00000014   end:00/000f data:00000010
   00000015   end:00/0017 data:00000010
   00000016   end:00/001f data:00000010
   00000017   end:00/0027 data:00000010
   00000018   end:00/004e data:00000010
   00000019   end:00/004e data:00000012
   0000001a   end:00/004e data:0000001b
   0000001b   end:00/004e data:00000013
   0000001e   end:00/004e data:00000016
   00000023   end:00/004e data:00000017
   00000024   end:00/004e data:00000018
   00000025   end:00/004e data:00000019
   00000026   end:00/004e data:0000001a
   000fb000   end:00/001f data:00000030
   000fb001   end:00/0027 data:00000031
   000fb002   end:00/000f data:00000032
   000fb003   end:00/000f data:00000033
   000fb004   end:00/0017 data:00000034
   000fb005   end:00/001f data:00000035
   000fb006   end:00/0027 data:00000036
   000fb7fe   end:00/000f data:00000029
   000fb7ff   end:00/0017 data:0000002a
   000fbffe   end:00/001f data:00000027
   000fbfff   end:00/0027 data:00000028
   000fcffe   end:00/000f data:00000025
   000fcfff   end:00/0017 data:00000026
   000fd000   end:00/001f data:00000037
   000fd001   end:00/000f data:00000038
   000fd002   end:00/0017 data:00000039
   000fd003   end:00/001f data:0000003a
   000fd004   end:00/0027 data:0000003b
   000fd7fe   end:00/001f data:00000023
   000fd7ff   end:00/0027 data:00000024
   000fdffe   end:00/000f data:00000021
   000fdfff   end:00/0017 data:00000022
   000feffe   end:00/001f data:0000001f
   000fefff   end:00/0027 data:00000020
   000ffff0   end:00/000f data:00000011
   000ffff1   end:00/0017 data:00000012
   000ffff2   end:00/001f data:00000013
   000ffff3   end:00/0027 data:00000014
   000ffff4   end:00/000f data:00000015
   000ffff5   end:00/0017 data:00000016
   000ffff6   end:00/001f data:00000017
   000ffff7   end:00/0027 data:00000018
   000ffff8   end:00/000f data:00000019
   000ffff9   end:00/0017 data:0000001a
   000ffffa   end:00/001f data:0000001b
   000ffffb   end:00/0027 data:0000001c
   000ffffc   end:00/000f data:0000001d
   000ffffd   end:00/0017 data:0000001e
XIVE[0] #0 ENDT
   0000000f -Q vqnb---f prio:7 nvt:00/0080 eq:@03400000   825/16384 ^1 [ 
8000004f 8000004f 80000
04f 8000004f 8000004f ^00000000 ]
   00000017 -Q vqnb---f prio:7 nvt:00/0084 eq:@03750000  1048/16384 ^1 [ 
8000001e 8000001e 80000
01e 8000001e 8000001e ^00000000 ]
   0000001f -Q vqnb---f prio:7 nvt:00/0088 eq:@037f0000   154/16384 ^1 [ 
8000003a 8000003a 80000
03a 8000003a 8000003a ^00000000 ]
   00000027 -Q vqnb---f prio:7 nvt:00/008c eq:@038a0000   340/16384 ^1 [ 
80000014 80000014 80000
014 80000014 8000003b ^00000000 ]
   0000004e -Q vqnbeu-- prio:6 nvt:00/0004 eq:@1d170000  1104/16384 ^1 [ 
80000016 80000016 80000
016 80000016 80000016 ^00000000 ]
   0000004f -Q v--be-s- prio:0 nvt:00/0000
XIVE[0] #0 END Escalation EAT
   0000004e -Q    end:00/004f data:00000000
   0000004f P-    end:00/000f data:0000004f
XIVE[0] #0 NVTT 00000000 .. 0007ffff
   00000000 end:00/0028 IPB:00
   00000001 end:00/0030 IPB:00
   00000002 end:00/0038 IPB:00
00000003 end:00/0040 IPB:00
   00000004 end:00/0048 IPB:02
   00000080 end:00/0008 IPB:00
   00000084 end:00/0010 IPB:00
   00000088 end:00/0018 IPB:00
   0000008c end:00/0020 IPB:00
PSIHB Source 000ffff0 .. 000ffffd
   000ffff0 LSI --
   000ffff1 LSI --
   000ffff2 LSI --
   000ffff3 LSI --
   000ffff4 LSI --
   000ffff5 LSI --
   000ffff6 LSI --
   000ffff7 LSI --
   000ffff8 LSI --
   000ffff9 LSI --
   000ffffa LSI --
   000ffffb LSI --
   000ffffc LSI --
   000ffffd LSI --
PHB4[0:0] Source 000fe000 .. 000fefff  @6030203110100
   00000ffe LSI --
   00000fff LSI --
PHB4[0:5] Source 000fb000 .. 000fb7ff  @6030203110228
   00000000 MSI --
   00000001 MSI --
   00000002 MSI --
   00000003 MSI --
   00000004 MSI --
   00000005 MSI --
   00000006 MSI --
   000007fe LSI --
   000007ff LSI --
PHB4[0:4] Source 000fb800 .. 000fbfff  @6030203110220
   000007fe LSI --
   000007ff LSI --
PHB4[0:3] Source 000fc000 .. 000fcfff  @6030203110218
   00000ffe LSI --
   00000fff LSI --
PHB4[0:2] Source 000fd000 .. 000fd7ff  @6030203110210
   00000000 MSI --
   00000001 MSI --
   00000002 MSI --
   00000003 MSI --
   00000004 MSI --
   000007fe LSI --
   000007ff LSI --
PHB4[0:1] Source 000fd800 .. 000fdfff  @6030203110208
   000007fe LSI --
   000007ff LSI --

> Could you please check with powersave=off in the host kernel also ?
> 

It still hangs with this option.

>> I've tested each patch of this series with [1], modified to use -smp for
>> machines that support more than one CPU. The machines I can currently
>> boot with FreeBSD (970/970,p/POWER5+/POWER7/POWER8/POWER9 pSeries,
>> POWER8/POWER9 PowerNV, and mpc8544ds) were tested with the images from
>> [2] and still boot after applying the patch series. Booting nested
>> guests inside a TCG pSeries machine also seems to be working fine.
>>
>> Using command lines like:
>>
>> ./qemu-system-ppc64 -M powernv9 -cpu POWER9 -accel tcg,thread=multi \
>>                  -m 8G -smp $SMP -vga none -nographic -kernel zImage \
>>                  -append 'console=hvc0' -initrdootfs.cpio.xz \
>>                  -serial pipe:pipe -monitor unix:mon,server,nowait
>>
>> and
>>
>> ./qemu-system-ppc64 -M pseries -cpu POWER9 -accel tcg,thread=multi \
>>                  -m 8G -smp $SMP -vga none -nographic -kernel zImage \
>>                  -append 'console=hvc0' -initrd rootfs.cpio.xz \
>>                  -serial pipe:pipe -monitor unix:mon,server,nowait
>>
>> to measure the time to boot, login, and shut down a compressed kernel
>> with a buildroot initramfs, with 100 iteration we get:
>>
>> +-----+------------------------------+-----------------------------+
>> |     |            PowerNV           |           pSeries           |
>> |-smp |------------------------------+-----------------------------+
>> |     |     master    | patch series |    master    | patch series |
>> +-----+------------------------------+-----------------------------+
>> |  1  |  45,84 ± 0,92 | 38,08 ± 0,66 | 23,56 ± 1,16 | 23,76 ± 1,04 |
>> |  2  |  80,21 ± 8,03 | 40,81 ± 0,45 | 26,59 ± 0,92 | 26,88 ± 0,99 |
>> |  4  | 115,98 ± 9,85 | 38,80 ± 0,44 | 28,83 ± 0,84 | 28,46 ± 0,94 |
>> |  6  | 199,14 ± 6,36 | 39,32 ± 0,50 | 29,22 ± 0,78 | 29,45 ± 0,86 |
>> |  8  | 47,85 ± 27,50 | 38,98 ± 0,49 | 29,63 ± 0,80 | 29,60 ± 0,78 |
>> +-----+------------------------------+-----------------------------+
>>
>> This results shows that the problem reported in [3] is solved, while
> 
> Yes. Nice work ! The PowerNV results with -smp 8 on master are unexpected.
> Did you do some profiling also ?
> 

We've noticed that in the original thread when Frederic reported the 
issue, this happens when the -smp >= $(nproc), but I haven't looked too 
deep in this case. Maybe some magic optimization on Linux mutex 
implementation that helps on the higher contention case?

>> pSeries boot time is essentially unchanged.
>>
>>
>> With a non-compressed kernel, the difference with PowerNV is smaller,
>> and pSeries stills the same:
>>
>> +-----+------------------------------+-----------------------------+
>> |     |            PowerNV           |           pSeries           |
>> |-smp |------------------------------+-----------------------------+
>> |     |     master    | patch series |    master    | patch series |
>> +-----+------------------------------+-----------------------------+
>> |  1  |  42,17 ± 0,92 | 38,13 ± 0,59 | 23,15 ± 1,02 | 23,46 ± 1,02 |
>> |  2  |  55,72 ± 3,54 | 40,30 ± 0,56 | 26,26 ± 0,82 | 26,38 ± 0,80 |
>> |  4  |  67,09 ± 3,02 | 38,26 ± 0,47 | 28,36 ± 0,77 | 28,19 ± 0,78 |
>> |  6  |  98,96 ± 2,49 | 39,01 ± 0,38 | 28,68 ± 0,75 | 29,02 ± 0,88 |
>> |  8  |  39,68 ± 0,42 | 38,44 ± 0,41 | 29,24 ± 0,81 | 29,44 ± 0,75 |
>> +-----+------------------------------+-----------------------------+
>>
>> Finally, using command lines like
>>
>> ./qemu-system-ppc64 -M powernv9 -cpu POWER9 -accel tcg,thread=multi \
>>      -m 8G -smp 4 -device virtio-scsi-pci -boot c -vga none -nographic \
>>      -device nvme,bus=pcie.2,addr=0x0,drive=drive0,serial=1234 \
>>      -drive file=rootfs.ext2,if=none,id=drive0,format=raw,cache=none \
>>      -snapshot -serial pipe:pipe -monitor unix:mon,server,nowait \
>>      -kernel zImage -append 'console=hvc0 rootwait root=/dev/nvme0n1' \
>>      -device virtio-net-pci,netdev=br0,mac=52:54:00:12:34:57,bus=pcie.0 \
>>      -netdev bridge,id=br0
>>
>> and
>>
>> ./qemu-system-ppc64 -M pseries -cpu POWER9 -accel tcg,thread=multi \
>>      -m 8G -smp 4 -device virtio-scsi-pci -boot c -vga none -nographic \
>>      -drive file=rootfs.ext2,if=scsi,index=0,format=raw -snapshot \
>>      -kernel zImage -append 'console=hvc0 rootwait root=/dev/sda' \
>>      -serial pipe:pipe -monitor unix:mon,server,nowait \
>>      -device virtio-net-pci,netdev=br0,mac=52:54:00:12:34:57 \
>>      -netdev bridge,id=br0
>>
>> to tests IO performance, with iperf to test network and a 4Gb scp
>> transfer to test disk+network, in 100 iterations we saw:
>>
>> +---------------------+---------------+-----------------+
>> |                     |    scp (s)    |   iperf (MB/s)  |
>> +---------------------+---------------+-----------------+
>> |PowerNV master       | 166,91 ± 8,37 | 918,06 ± 114,78 |
>> |PowerNV patch series | 166,25 ± 8,85 | 916,91 ± 107,56 |
>> |pSeries master       | 175,70 ± 8,22 | 958,73 ± 115,09 |
>> |pSeries patch series | 173,62 ± 8,13 | 893,42 ±  87,77 |
>> +---------------------+---------------+-----------------+
> 
> These are SMP machines under high IO load using MTTCG. It means
> that the models are quite robust now.
> 
>> The scp data shows little difference, while testing just network shows
>> that it's a bit slower with the patch series applied (although, with
>> this variation, we'd probably need to repeat this test more times to
>> have a more robust result...)
> 
> You could try with powersave=off.
> 

Not a big difference, with 50 iterations:

+---------------------+---------------+-----------------+
|                     |    scp (s)    |   iperf (MB/s)  |
+---------------------+---------------+-----------------+
|PowerNV master       | 142.73 ± 8.38 | 924.34 ± 353.93 |
|PowerNV patch series | 145.75 ± 9.18 | 874.52 ± 286.21 |
+---------------------+---------------+-----------------+

Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>


  reply	other threads:[~2022-10-03 15:52 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-27 20:15 [RFC PATCH v2 00/29] PowerPC interrupt rework Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 01/29] target/ppc: define PPC_INTERRUPT_* values directly Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 02/29] target/ppc: always use ppc_set_irq to set env->pending_interrupts Matheus Ferst
2022-09-30 14:32   ` Fabiano Rosas
2022-09-27 20:15 ` [RFC PATCH v2 03/29] target/ppc: split interrupt masking and delivery from ppc_hw_interrupt Matheus Ferst
2022-09-30 15:55   ` Fabiano Rosas
2022-09-27 20:15 ` [RFC PATCH v2 04/29] target/ppc: prepare to split interrupt masking and delivery by excp_model Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 05/29] target/ppc: create an interrupt masking method for POWER9/POWER10 Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 06/29] target/ppc: remove unused interrupts from p9_pending_interrupt Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 07/29] target/ppc: create an interrupt delivery method for POWER9/POWER10 Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 08/29] target/ppc: remove unused interrupts from p9_deliver_interrupt Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 09/29] target/ppc: remove generic architecture checks " Matheus Ferst
2022-09-30 18:13   ` Fabiano Rosas
2022-10-03 15:45     ` Matheus K. Ferst
2022-10-03 16:59       ` Fabiano Rosas
2022-09-27 20:15 ` [RFC PATCH v2 10/29] target/ppc: move power-saving interrupt masking out of cpu_has_work_POWER9 Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 11/29] target/ppc: add power-saving interrupt masking logic to p9_next_unmasked_interrupt Matheus Ferst
2022-09-30 18:38   ` Fabiano Rosas
2022-10-03 15:46     ` Matheus K. Ferst
2022-10-03 17:01       ` Fabiano Rosas
2022-09-27 20:15 ` [RFC PATCH v2 12/29] target/ppc: create an interrupt masking method for POWER8 Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 13/29] target/ppc: remove unused interrupts from p8_pending_interrupt Matheus Ferst
2022-09-27 22:14   ` Fabiano Rosas
2022-10-03 15:45     ` Matheus K. Ferst
2022-09-27 20:15 ` [RFC PATCH v2 14/29] target/ppc: create an interrupt delivery method for POWER8 Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 15/29] target/ppc: remove unused interrupts from p8_deliver_interrupt Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 16/29] target/ppc: remove generic architecture checks " Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 17/29] target/ppc: move power-saving interrupt masking out of cpu_has_work_POWER8 Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 18/29] target/ppc: add power-saving interrupt masking logic to p8_next_unmasked_interrupt Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 19/29] target/ppc: create an interrupt masking method for POWER7 Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 20/29] target/ppc: remove unused interrupts from p7_pending_interrupt Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 21/29] target/ppc: create an interrupt delivery method for POWER7 Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 22/29] target/ppc: remove unused interrupts from p7_deliver_interrupt Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 23/29] target/ppc: remove generic architecture checks " Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 24/29] target/ppc: move power-saving interrupt masking out of cpu_has_work_POWER7 Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 25/29] target/ppc: add power-saving interrupt masking logic to p7_next_unmasked_interrupt Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 26/29] target/ppc: remove ppc_store_lpcr from CONFIG_USER_ONLY builds Matheus Ferst
2022-09-27 20:15 ` [RFC PATCH v2 27/29] target/ppc: introduce ppc_maybe_interrupt Matheus Ferst
2022-10-03 14:11   ` Fabiano Rosas
2022-09-27 20:15 ` [RFC PATCH v2 28/29] target/ppc: unify cpu->has_work based on cs->interrupt_request Matheus Ferst
2022-10-03 14:12   ` Fabiano Rosas
2022-09-27 20:15 ` [RFC PATCH v2 29/29] target/ppc: move the p*_interrupt_powersave methods to excp_helper.c Matheus Ferst
2022-10-03 14:13   ` Fabiano Rosas
2022-09-28 17:31 ` [RFC PATCH v2 00/29] PowerPC interrupt rework Cédric Le Goater
2022-10-03 15:45   ` Matheus K. Ferst [this message]
2022-10-03 20:58     ` Cédric Le Goater

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=03ad8964-a7c1-5b26-00aa-3b028296e0d0@eldorado.org.br \
    --to=matheus.ferst@eldorado.org.br \
    --cc=alex.bennee@linaro.org \
    --cc=clg@kaod.org \
    --cc=danielhb413@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=farosas@linux.ibm.com \
    --cc=fbarrat@linux.ibm.com \
    --cc=groug@kaod.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.