* [xen-unstable-smoke test] 169781: regressions - FAIL @ 2022-04-27 16:38 osstest service owner 2022-04-27 17:10 ` Julien Grall 0 siblings, 1 reply; 12+ messages in thread From: osstest service owner @ 2022-04-27 16:38 UTC (permalink / raw) To: xen-devel flight 169781 xen-unstable-smoke real [real] flight 169785 xen-unstable-smoke real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/169781/ http://logs.test-lab.xenproject.org/osstest/logs/169785/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-arm64-arm64-xl-xsm 8 xen-boot fail REGR. vs. 169773 Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-check fail never pass test-armhf-armhf-xl 15 migrate-support-check fail never pass test-armhf-armhf-xl 16 saverestore-support-check fail never pass version targeted for testing: xen fa6dc0879ffd3dffffaea2837953c7a8761a9ba0 baseline version: xen 163071b1800304c962756789b4ef0ddb978059ba Last test of basis 169773 2022-04-27 08:01:54 Z 0 days Testing same since 169781 2022-04-27 12:01:52 Z 0 days 1 attempts ------------------------------------------------------------ People who touched revisions under test: David Vrabel <dvrabel@amazon.co.uk> Julien Grall <jgrall@amazon.com> jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm fail test-amd64-amd64-xl-qemuu-debianhvm-amd64 pass test-amd64-amd64-libvirt pass ------------------------------------------------------------ sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. ------------------------------------------------------------ commit fa6dc0879ffd3dffffaea2837953c7a8761a9ba0 Author: David Vrabel <dvrabel@amazon.co.uk> Date: Tue Apr 26 10:33:01 2022 +0200 page_alloc: assert IRQs are enabled in heap alloc/free Heap pages can only be safely allocated and freed with interrupts enabled as they may require a TLB flush which may send IPIs (on x86). Normally spinlock debugging would catch calls from the incorrect context, but not from stop_machine_run() action functions as these are called with spin lock debugging disabled. Enhance the assertions in alloc_xenheap_pages() and alloc_domheap_pages() to check interrupts are enabled. For consistency the same asserts are used when freeing heap pages. As an exception, when only 1 PCPU is online, allocations are permitted with interrupts disabled as any TLB flushes would be local only. This is necessary during early boot. Signed-off-by: David Vrabel <dvrabel@amazon.co.uk> Reviewed-by: Jan Beulich <jbeulich@suse.com> commit fbd2445558beff90eb9607308f0845b18a7a2b5a Author: Julien Grall <jgrall@amazon.com> Date: Tue Apr 26 21:06:29 2022 +0100 xen/arm: alternative: Don't call vmap() within stop_machine_run() Commit 88a037e2cfe1 "page_alloc: assert IRQs are enabled in heap alloc/free" extended the checks in the buddy allocator to catch any use of the helpers from context with interrupts disabled. Unfortunately, the rule is not followed in the alternative code and this will result to crash at boot with debug enabled: (XEN) Xen call trace: (XEN) [<0022a510>] alloc_xenheap_pages+0x120/0x150 (PC) (XEN) [<00000000>] 00000000 (LR) (XEN) [<002736ac>] arch/arm/mm.c#xen_pt_update+0x144/0x6e4 (XEN) [<002740d4>] map_pages_to_xen+0x10/0x20 (XEN) [<00236864>] __vmap+0x400/0x4a4 (XEN) [<0026aee8>] arch/arm/alternative.c#__apply_alternatives_multi_stop+0x144/0x1ec (XEN) [<0022fe40>] stop_machine_run+0x23c/0x300 (XEN) [<002c40c4>] apply_alternatives_all+0x34/0x5c (XEN) [<002ce3e8>] start_xen+0xcb8/0x1024 (XEN) [<00200068>] arch/arm/arm32/head.o#primary_switched+0xc/0x1c The interrupts will be disabled by the state machine in stop_machine_run(), hence why the ASSERT is hit. For now the patch extending the checks has been reverted, but it would be good to re-introduce it (allocation with interrupts disabled is not desirable). So move the re-mapping of Xen to the caller of stop_machine_run(). Signed-off-by: Julien Grall <jgrall@amazon.com> Cc: David Vrabel <dvrabel@amazon.co.uk> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> (qemu changes not included) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL 2022-04-27 16:38 [xen-unstable-smoke test] 169781: regressions - FAIL osstest service owner @ 2022-04-27 17:10 ` Julien Grall 2022-04-27 23:02 ` Stefano Stabellini 2022-04-28 7:45 ` Jan Beulich 0 siblings, 2 replies; 12+ messages in thread From: Julien Grall @ 2022-04-27 17:10 UTC (permalink / raw) To: osstest service owner, xen-devel Cc: Jan Beulich, David Vrabel, Stefano Stabellini, Bertrand Marquis Hi, On 27/04/2022 17:38, osstest service owner wrote: > flight 169781 xen-unstable-smoke real [real] > flight 169785 xen-unstable-smoke real-retest [real] > http://logs.test-lab.xenproject.org/osstest/logs/169781/ > http://logs.test-lab.xenproject.org/osstest/logs/169785/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > test-arm64-arm64-xl-xsm 8 xen-boot fail REGR. vs. 169773 Well, I was overly optimistic :(. This now breaks in the ITS code: Apr 27 13:23:14.324831 (XEN) Xen call trace: Apr 27 13:23:14.324855 (XEN) [<000000000022a678>] alloc_xenheap_pages+0x178/0x194 (PC) Apr 27 13:23:14.336856 (XEN) [<000000000022a670>] alloc_xenheap_pages+0x170/0x194 (LR) Apr 27 13:23:14.336886 (XEN) [<0000000000237770>] _xmalloc+0x144/0x294 Apr 27 13:23:14.348773 (XEN) [<00000000002378d4>] _xzalloc+0x14/0x30 Apr 27 13:23:14.348808 (XEN) [<000000000027b4e4>] gicv3_lpi_init_rdist+0x54/0x324 Apr 27 13:23:14.348835 (XEN) [<0000000000279898>] arch/arm/gic-v3.c#gicv3_cpu_init+0x128/0x46c Apr 27 13:23:14.360799 (XEN) [<0000000000279bfc>] arch/arm/gic-v3.c#gicv3_secondary_cpu_init+0x20/0x50 Apr 27 13:23:14.372796 (XEN) [<0000000000277054>] gic_init_secondary_cpu+0x18/0x30 Apr 27 13:23:14.372829 (XEN) [<0000000000284518>] start_secondary+0x1a8/0x234 Apr 27 13:23:14.372856 (XEN) [<0000010722aa4200>] 0000010722aa4200 Apr 27 13:23:14.384793 (XEN) Apr 27 13:23:14.384823 (XEN) Apr 27 13:23:14.384845 (XEN) **************************************** Apr 27 13:23:14.384869 (XEN) Panic on CPU 2: Apr 27 13:23:14.384891 (XEN) Assertion '!in_irq() && (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at common/page_alloc.c:2212 Apr 27 13:23:14.396805 (XEN) **************************************** The GICv3 LPI code contains a few calls to xmalloc() that will be done while initializing the GIC CPU interface. I don't think we can delay the initialization of the LPI part past local_irq_enable(). So I think we will need to allocate the memory when preparing the CPU. Any thoughts? Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL 2022-04-27 17:10 ` Julien Grall @ 2022-04-27 23:02 ` Stefano Stabellini 2022-04-27 23:13 ` Julien Grall 2022-04-28 7:45 ` Jan Beulich 1 sibling, 1 reply; 12+ messages in thread From: Stefano Stabellini @ 2022-04-27 23:02 UTC (permalink / raw) To: Julien Grall Cc: osstest service owner, xen-devel, Jan Beulich, David Vrabel, Stefano Stabellini, Bertrand Marquis On Wed, 27 Apr 2022, Julien Grall wrote: > On 27/04/2022 17:38, osstest service owner wrote: > > flight 169781 xen-unstable-smoke real [real] > > flight 169785 xen-unstable-smoke real-retest [real] > > http://logs.test-lab.xenproject.org/osstest/logs/169781/ > > http://logs.test-lab.xenproject.org/osstest/logs/169785/ > > > > Regressions :-( > > > > Tests which did not succeed and are blocking, > > including tests which could not be run: > > test-arm64-arm64-xl-xsm 8 xen-boot fail REGR. vs. > > 169773 > > Well, I was overly optimistic :(. This now breaks in the ITS code: > > Apr 27 13:23:14.324831 (XEN) Xen call trace: > Apr 27 13:23:14.324855 (XEN) [<000000000022a678>] > alloc_xenheap_pages+0x178/0x194 (PC) > Apr 27 13:23:14.336856 (XEN) [<000000000022a670>] > alloc_xenheap_pages+0x170/0x194 (LR) > Apr 27 13:23:14.336886 (XEN) [<0000000000237770>] _xmalloc+0x144/0x294 > Apr 27 13:23:14.348773 (XEN) [<00000000002378d4>] _xzalloc+0x14/0x30 > Apr 27 13:23:14.348808 (XEN) [<000000000027b4e4>] > gicv3_lpi_init_rdist+0x54/0x324 > Apr 27 13:23:14.348835 (XEN) [<0000000000279898>] > arch/arm/gic-v3.c#gicv3_cpu_init+0x128/0x46c > Apr 27 13:23:14.360799 (XEN) [<0000000000279bfc>] > arch/arm/gic-v3.c#gicv3_secondary_cpu_init+0x20/0x50 > Apr 27 13:23:14.372796 (XEN) [<0000000000277054>] > gic_init_secondary_cpu+0x18/0x30 > Apr 27 13:23:14.372829 (XEN) [<0000000000284518>] > start_secondary+0x1a8/0x234 > Apr 27 13:23:14.372856 (XEN) [<0000010722aa4200>] 0000010722aa4200 > Apr 27 13:23:14.384793 (XEN) > Apr 27 13:23:14.384823 (XEN) > Apr 27 13:23:14.384845 (XEN) **************************************** > Apr 27 13:23:14.384869 (XEN) Panic on CPU 2: > Apr 27 13:23:14.384891 (XEN) Assertion '!in_irq() && (local_irq_is_enabled() > || num_online_cpus() <= 1)' failed at common/page_alloc.c:2212 > Apr 27 13:23:14.396805 (XEN) **************************************** > > The GICv3 LPI code contains a few calls to xmalloc() that will be done while > initializing the GIC CPU interface. I don't think we can delay the > initialization of the LPI part past local_irq_enable(). So I think we will > need to allocate the memory when preparing the CPU. > > Any thoughts? As a general principle I think the ASSERT is a good idea, and it should make the code better and safer. I would not change the code to make the ASSERT go away if not to improve the code. In this case, gicv3_lpi_init_rdist and gicv3_lpi_allocate_pendtable should be __init functions although they are not marked as __init at the moment. It seems to me that it is acceptable to allocate memory with interrupt disabled during __init. I cannot see any drawbacks with it. I think we should change the ASSERT to only trigger after __init: system_state == SYS_STATE_active. What do you think? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL 2022-04-27 23:02 ` Stefano Stabellini @ 2022-04-27 23:13 ` Julien Grall 2022-04-28 0:47 ` Stefano Stabellini 0 siblings, 1 reply; 12+ messages in thread From: Julien Grall @ 2022-04-27 23:13 UTC (permalink / raw) To: Stefano Stabellini Cc: osstest service owner, xen-devel, Jan Beulich, David Vrabel, Bertrand Marquis [-- Attachment #1: Type: text/plain, Size: 515 bytes --] Hi Stefano, On Thu, 28 Apr 2022, 00:02 Stefano Stabellini, <sstabellini@kernel.org> wrote > It seems to me that it is acceptable to allocate memory with interrupt > disabled during __init. I cannot see any drawbacks with it. I think we > should change the ASSERT to only trigger after __init: system_state == > SYS_STATE_active. > > What do you think? > This would solve the immediate problem but not the long term one (i.e cpu hotplug). So I think it would be better to properly fix it right away. Cheers, > [-- Attachment #2: Type: text/html, Size: 1165 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL 2022-04-27 23:13 ` Julien Grall @ 2022-04-28 0:47 ` Stefano Stabellini 2022-04-28 11:19 ` Julien Grall 0 siblings, 1 reply; 12+ messages in thread From: Stefano Stabellini @ 2022-04-28 0:47 UTC (permalink / raw) To: Julien Grall Cc: Stefano Stabellini, osstest service owner, xen-devel, Jan Beulich, David Vrabel, Bertrand Marquis On Thu, 28 Apr 2022, Julien Grall wrote: > Hi Stefano, > > On Thu, 28 Apr 2022, 00:02 Stefano Stabellini, <sstabellini@kernel.org> wrote > It seems to me that it is acceptable to allocate memory with interrupt > disabled during __init. I cannot see any drawbacks with it. I think we > should change the ASSERT to only trigger after __init: system_state == > SYS_STATE_active. > > What do you think? > > > This would solve the immediate problem but not the long term one (i.e cpu hotplug). > > So I think it would be better to properly fix it right away. Yeah, you are right about cpu hotplug. I think both statements are true: - it is true that this is supposed to work with cpu hotplug and these functions might be directly affected by cpu hotplug (by a CPU coming online later on) - it is also true that it might not make sense to ASSERT at __init time if IRQs are disabled. There might be other places, not affected by cpu hotplug, where we do memory allocation at __init time with IRQ disabled. It might still be a good idea to add the system_state == SYS_STATE_active check in the ASSERT, not to solve this specific problem but to avoid other issues. In regard to gicv3_lpi_allocate_pendtable, I haven't thought about the implications of cpu hotplug for LPIs and GICv3 before. Do you envision that in a CPU hotplug scenario gicv3_lpi_init_rdist would be called when the extra CPU comes online? Today gicv3_lpi_init_rdist is called based on the number of rdist_regions without checking if the CPU is online or offline (I think ?) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL 2022-04-28 0:47 ` Stefano Stabellini @ 2022-04-28 11:19 ` Julien Grall 2022-04-29 0:41 ` Stefano Stabellini 0 siblings, 1 reply; 12+ messages in thread From: Julien Grall @ 2022-04-28 11:19 UTC (permalink / raw) To: Stefano Stabellini, Julien Grall Cc: osstest service owner, xen-devel, Jan Beulich, David Vrabel, Bertrand Marquis Hi Stefano, On 28/04/2022 01:47, Stefano Stabellini wrote: > On Thu, 28 Apr 2022, Julien Grall wrote: >> Hi Stefano, >> >> On Thu, 28 Apr 2022, 00:02 Stefano Stabellini, <sstabellini@kernel.org> wrote >> It seems to me that it is acceptable to allocate memory with interrupt >> disabled during __init. I cannot see any drawbacks with it. I think we >> should change the ASSERT to only trigger after __init: system_state == >> SYS_STATE_active. >> >> What do you think? >> >> >> This would solve the immediate problem but not the long term one (i.e cpu hotplug). >> >> So I think it would be better to properly fix it right away. > > Yeah, you are right about cpu hotplug. I think both statements are true: > > - it is true that this is supposed to work with cpu hotplug and these > functions might be directly affected by cpu hotplug (by a CPU coming > online later on) > > - it is also true that it might not make sense to ASSERT at __init time > if IRQs are disabled. There might be other places, not affected by cpu > hotplug, where we do memory allocation at __init time with IRQ > disabled. It might still be a good idea to add the system_state == > SYS_STATE_active check in the ASSERT, not to solve this specific > problem but to avoid other issues. AFAIU, it is not safe on x86 to do TLB flush with interrupts disabled *and* multiple CPUs running. So we can't generically relax the check. Looking at the OSSTest results, both Arm32 and Arm64 without GICv3 ITS tests have passed. So it seems unnecessary to me to preemptively relax the check just for Arm. > > > In regard to gicv3_lpi_allocate_pendtable, I haven't thought about the > implications of cpu hotplug for LPIs and GICv3 before. Do you envision > that in a CPU hotplug scenario gicv3_lpi_init_rdist would be called when > the extra CPU comes online? It is already called per-CPU. See gicv3_secondary_cpu_init() -> gicv3_cpu_init() -> gicv3_populate_rdist(). > > Today gicv3_lpi_init_rdist is called based on the number of > rdist_regions without checking if the CPU is online or offline (I think ?) The re-distributors are not banked and therefore accessible by everyone. However, in Xen case, each pCPU will only touch its own re-distributor (well aside TYPER to figure out the ID). The loop in gicv3_populate_rdist() will walk throught all the re-distributor to find which one corresponds to the current pCPU. Once we found it, we will call gicv3_lpi_init_rdist() to fully initialize the re-distributor. I don't think we want to populate the memory for each re-distributor in advance. Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL 2022-04-28 11:19 ` Julien Grall @ 2022-04-29 0:41 ` Stefano Stabellini 2022-04-29 9:04 ` Julien Grall 0 siblings, 1 reply; 12+ messages in thread From: Stefano Stabellini @ 2022-04-29 0:41 UTC (permalink / raw) To: Julien Grall Cc: Stefano Stabellini, Julien Grall, osstest service owner, xen-devel, Jan Beulich, David Vrabel, Bertrand Marquis On Thu, 28 Apr 2022, Julien Grall wrote: > On 28/04/2022 01:47, Stefano Stabellini wrote: > > On Thu, 28 Apr 2022, Julien Grall wrote: > > > Hi Stefano, > > > > > > On Thu, 28 Apr 2022, 00:02 Stefano Stabellini, <sstabellini@kernel.org> > > > wrote > > > It seems to me that it is acceptable to allocate memory with > > > interrupt > > > disabled during __init. I cannot see any drawbacks with it. I think > > > we > > > should change the ASSERT to only trigger after __init: system_state > > > == > > > SYS_STATE_active. > > > > > > What do you think? > > > > > > > > > This would solve the immediate problem but not the long term one (i.e cpu > > > hotplug). > > > > > > So I think it would be better to properly fix it right away. > > > > Yeah, you are right about cpu hotplug. I think both statements are true: > > > > - it is true that this is supposed to work with cpu hotplug and these > > functions might be directly affected by cpu hotplug (by a CPU coming > > online later on) > > > > - it is also true that it might not make sense to ASSERT at __init time > > if IRQs are disabled. There might be other places, not affected by cpu > > hotplug, where we do memory allocation at __init time with IRQ > > disabled. It might still be a good idea to add the system_state == > > SYS_STATE_active check in the ASSERT, not to solve this specific > > problem but to avoid other issues. > > AFAIU, it is not safe on x86 to do TLB flush with interrupts disabled *and* > multiple CPUs running. So we can't generically relax the check. > > Looking at the OSSTest results, both Arm32 and Arm64 without GICv3 ITS tests > have passed. So it seems unnecessary to me to preemptively relax the check > just for Arm. It is good news that it works already (GICv3 aside) on ARM. If you prefer not to relax it, I am OK with it (although it makes me a bit worried about future breakages). > > In regard to gicv3_lpi_allocate_pendtable, I haven't thought about the > > implications of cpu hotplug for LPIs and GICv3 before. Do you envision > > that in a CPU hotplug scenario gicv3_lpi_init_rdist would be called when > > the extra CPU comes online? > > It is already called per-CPU. See gicv3_secondary_cpu_init() -> > gicv3_cpu_init() -> gicv3_populate_rdist(). Got it, thanks! > > Today gicv3_lpi_init_rdist is called based on the number of > > rdist_regions without checking if the CPU is online or offline (I think ?) > > The re-distributors are not banked and therefore accessible by everyone. > However, in Xen case, each pCPU will only touch its own re-distributor (well > aside TYPER to figure out the ID). > > The loop in gicv3_populate_rdist() will walk throught all the > re-distributor to find which one corresponds to the current pCPU. Once we > found it, we will call gicv3_lpi_init_rdist() to fully initialize the > re-distributor. > > I don't think we want to populate the memory for each re-distributor in > advance. I agree. Currently we do: start_secondary [...] gic_init_secondary_cpu() [...] gicv3_lpi_init_rdist() [...] local_irq_enable(); Which seems to be the right sequence to me. There must be an early boot phase where interrupts are disabled on a CPU but memory allocations are possible. If this was x86 with the tlbflush limitation, I would suggest to have per-cpu memory mapping areas so that we don't have to do any global tlb flushes with interrupts disabled. On ARM, we don't have the tlbflush limitation so we could do that but we wouldn't have much to gain from it. Also, this seems to be a bit of a special case, because in general we can move drivers initializations later after local_irq_enable(). But this is the interrupt controller driver itself -- we cannot move it after local_irq_enable(). So maybe an ad-hoc solution could be acceptable? The only one I can think of is to check on system_state == SYS_STATE_active now. In the future for CPU hotplug we could have a per-CPU system_state, like cpu_system_state, and do a similar check. I am totally open to other ideas, I couldn't come up with anything better at the moment. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL 2022-04-29 0:41 ` Stefano Stabellini @ 2022-04-29 9:04 ` Julien Grall 2022-04-29 16:15 ` Stefano Stabellini 0 siblings, 1 reply; 12+ messages in thread From: Julien Grall @ 2022-04-29 9:04 UTC (permalink / raw) To: Stefano Stabellini Cc: Julien Grall, osstest service owner, xen-devel, Jan Beulich, David Vrabel, Bertrand Marquis Hi Stefano, On 29/04/2022 01:41, Stefano Stabellini wrote: > On Thu, 28 Apr 2022, Julien Grall wrote: >> On 28/04/2022 01:47, Stefano Stabellini wrote: >>> On Thu, 28 Apr 2022, Julien Grall wrote: >>>> Hi Stefano, >>>> >>>> On Thu, 28 Apr 2022, 00:02 Stefano Stabellini, <sstabellini@kernel.org> >>>> wrote >>>> It seems to me that it is acceptable to allocate memory with >>>> interrupt >>>> disabled during __init. I cannot see any drawbacks with it. I think >>>> we >>>> should change the ASSERT to only trigger after __init: system_state >>>> == >>>> SYS_STATE_active. >>>> >>>> What do you think? >>>> >>>> >>>> This would solve the immediate problem but not the long term one (i.e cpu >>>> hotplug). >>>> >>>> So I think it would be better to properly fix it right away. >>> >>> Yeah, you are right about cpu hotplug. I think both statements are true: >>> >>> - it is true that this is supposed to work with cpu hotplug and these >>> functions might be directly affected by cpu hotplug (by a CPU coming >>> online later on) >>> >>> - it is also true that it might not make sense to ASSERT at __init time >>> if IRQs are disabled. There might be other places, not affected by cpu >>> hotplug, where we do memory allocation at __init time with IRQ >>> disabled. It might still be a good idea to add the system_state == >>> SYS_STATE_active check in the ASSERT, not to solve this specific >>> problem but to avoid other issues. >> >> AFAIU, it is not safe on x86 to do TLB flush with interrupts disabled *and* >> multiple CPUs running. So we can't generically relax the check. >> >> Looking at the OSSTest results, both Arm32 and Arm64 without GICv3 ITS tests >> have passed. So it seems unnecessary to me to preemptively relax the check >> just for Arm. > > It is good news that it works already (GICv3 aside) on ARM. If you > prefer not to relax it, I am OK with it (although it makes me a bit > worried about future breakages). Bear in mind this is a debug only breakage, production build will work fines with any ASSERT() affecting large code base, it is going to be difficult to find all the potential misuse. So we have to rely on wider testing and fix it as it gets reported. If we relax the check, then we are never going to be able to harden the code in timely maneer. >>> In regard to gicv3_lpi_allocate_pendtable, I haven't thought about the >>> implications of cpu hotplug for LPIs and GICv3 before. Do you envision >>> that in a CPU hotplug scenario gicv3_lpi_init_rdist would be called when >>> the extra CPU comes online? >> >> It is already called per-CPU. See gicv3_secondary_cpu_init() -> >> gicv3_cpu_init() -> gicv3_populate_rdist(). > > Got it, thanks! > > >>> Today gicv3_lpi_init_rdist is called based on the number of >>> rdist_regions without checking if the CPU is online or offline (I think ?) >> >> The re-distributors are not banked and therefore accessible by everyone. >> However, in Xen case, each pCPU will only touch its own re-distributor (well >> aside TYPER to figure out the ID). >> >> The loop in gicv3_populate_rdist() will walk throught all the >> re-distributor to find which one corresponds to the current pCPU. Once we >> found it, we will call gicv3_lpi_init_rdist() to fully initialize the >> re-distributor. >> >> I don't think we want to populate the memory for each re-distributor in >> advance. > > I agree. > > Currently we do: > > start_secondary > [...] > gic_init_secondary_cpu() > [...] > gicv3_lpi_init_rdist() > [...] > local_irq_enable(); > > Which seems to be the right sequence to me. There must be an early boot > phase where interrupts are disabled on a CPU but memory allocations are > possible. If this was x86 with the tlbflush limitation, I would suggest > to have per-cpu memory mapping areas so that we don't have to do any > global tlb flushes with interrupts disabled. > > On ARM, we don't have the tlbflush limitation so we could do that but we > wouldn't have much to gain from it. > > Also, this seems to be a bit of a special case, because in general we > can move drivers initializations later after local_irq_enable(). But > this is the interrupt controller driver itself -- we cannot move it > after local_irq_enable(). > > So maybe an ad-hoc solution could be acceptable? We don't need any ad-hoc solution here. We can register a CPU notifier that will notify us when a CPU will be prepared. Something like below should work (untested yet): diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c index e1594dd20e4c..ccf4868540f5 100644 --- a/xen/arch/arm/gic-v3-lpi.c +++ b/xen/arch/arm/gic-v3-lpi.c @@ -18,6 +18,7 @@ * along with this program; If not, see <http://www.gnu.org/licenses/>. */ +#include <xen/cpu.h> #include <xen/lib.h> #include <xen/mm.h> #include <xen/param.h> @@ -234,18 +235,13 @@ void gicv3_lpi_update_host_entry(uint32_t host_lpi, int domain_id, write_u64_atomic(&hlpip->data, hlpi.data); } -static int gicv3_lpi_allocate_pendtable(uint64_t *reg) +static int gicv3_lpi_allocate_pendtable(unsigned int cpu) { - uint64_t val; void *pendtable; - if ( this_cpu(lpi_redist).pending_table ) + if ( per_cpu(lpi_redist, cpu).pending_table ) return -EBUSY; - val = GIC_BASER_CACHE_RaWaWb << GICR_PENDBASER_INNER_CACHEABILITY_SHIFT; - val |= GIC_BASER_CACHE_SameAsInner << GICR_PENDBASER_OUTER_CACHEABILITY_SHIFT; - val |= GIC_BASER_InnerShareable << GICR_PENDBASER_SHAREABILITY_SHIFT; - /* * The pending table holds one bit per LPI and even covers bits for * interrupt IDs below 8192, so we allocate the full range. @@ -265,13 +261,38 @@ static int gicv3_lpi_allocate_pendtable(uint64_t *reg) clean_and_invalidate_dcache_va_range(pendtable, lpi_data.max_host_lpi_ids / 8); - this_cpu(lpi_redist).pending_table = pendtable; + per_cpu(lpi_redist, cpu).pending_table = pendtable; - val |= GICR_PENDBASER_PTZ; + return 0; +} + +static int gicv3_lpi_set_pendtable(void __iomem *rdist_base) +{ + const void *pendtable = this_cpu(lpi_redist).pending_table; + uint64_t val; + + if ( !pendtable ) + return -ENOMEM; + ASSERT(virt_to_maddr(pendtable) & ~GENMASK(51, 16)); + + val = GIC_BASER_CACHE_RaWaWb << GICR_PENDBASER_INNER_CACHEABILITY_SHIFT; + val |= GIC_BASER_CACHE_SameAsInner << GICR_PENDBASER_OUTER_CACHEABILITY_SHIFT; + val |= GIC_BASER_InnerShareable << GICR_PENDBASER_SHAREABILITY_SHIFT; + val |= GICR_PENDBASER_PTZ; val |= virt_to_maddr(pendtable); - *reg = val; + writeq_relaxed(val, rdist_base + GICR_PENDBASER); + val = readq_relaxed(rdist_base + GICR_PENDBASER); + + /* If the hardware reports non-shareable, drop cacheability as well. */ + if ( !(val & GICR_PENDBASER_SHAREABILITY_MASK) ) + { + val &= ~GICR_PENDBASER_INNER_CACHEABILITY_MASK; + val |= GIC_BASER_CACHE_nC << GICR_PENDBASER_INNER_CACHEABILITY_SHIFT; + + writeq_relaxed(val, rdist_base + GICR_PENDBASER); + } return 0; } @@ -340,7 +361,6 @@ static int gicv3_lpi_set_proptable(void __iomem * rdist_base) int gicv3_lpi_init_rdist(void __iomem * rdist_base) { uint32_t reg; - uint64_t table_reg; int ret; /* We don't support LPIs without an ITS. */ @@ -352,24 +372,33 @@ int gicv3_lpi_init_rdist(void __iomem * rdist_base) if ( reg & GICR_CTLR_ENABLE_LPIS ) return -EBUSY; - ret = gicv3_lpi_allocate_pendtable(&table_reg); + ret = gicv3_lpi_set_pendtable(rdist_base); if ( ret ) return ret; - writeq_relaxed(table_reg, rdist_base + GICR_PENDBASER); - table_reg = readq_relaxed(rdist_base + GICR_PENDBASER); - /* If the hardware reports non-shareable, drop cacheability as well. */ - if ( !(table_reg & GICR_PENDBASER_SHAREABILITY_MASK) ) - { - table_reg &= ~GICR_PENDBASER_INNER_CACHEABILITY_MASK; - table_reg |= GIC_BASER_CACHE_nC << GICR_PENDBASER_INNER_CACHEABILITY_SHIFT; + return gicv3_lpi_set_proptable(rdist_base); +} + +static int cpu_callback(struct notifier_block *nfb, unsigned long action, + void *hcpu) +{ + unsigned long cpu = (unsigned long)hcpu; + int rc = 0; - writeq_relaxed(table_reg, rdist_base + GICR_PENDBASER); + switch ( action ) + { + case CPU_UP_PREPARE: + rc = gicv3_lpi_allocate_pendtable(cpu); + break; } - return gicv3_lpi_set_proptable(rdist_base); + return !rc ? NOTIFY_DONE : notifier_from_errno(rc); } +static struct notifier_block cpu_nfb = { + .notifier_call = cpu_callback, +}; + static unsigned int max_lpi_bits = 20; integer_param("max_lpi_bits", max_lpi_bits); @@ -381,6 +410,7 @@ integer_param("max_lpi_bits", max_lpi_bits); int gicv3_lpi_init_host_lpis(unsigned int host_lpi_bits) { unsigned int nr_lpi_ptrs; + int rc; /* We rely on the data structure being atomically accessible. */ BUILD_BUG_ON(sizeof(union host_lpi) > sizeof(unsigned long)); @@ -413,7 +443,14 @@ int gicv3_lpi_init_host_lpis(unsigned int host_lpi_bits) printk("GICv3: using at most %lu LPIs on the host.\n", MAX_NR_HOST_LPIS); - return 0; + /* Register the CPU notifier and allocate memory for the boot CPU */ + register_cpu_notifier(&cpu_nfb); + rc = gicv3_lpi_allocate_pendtable(smp_processor_id()); + if ( rc ) + printk(XENLOG_ERR "Unable to allocate the pendtable for CPU%u\n", + smp_processor_id()); + + return rc; } static int find_unused_host_lpi(uint32_t start, uint32_t *index) Cheers, -- Julien Grall ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL 2022-04-29 9:04 ` Julien Grall @ 2022-04-29 16:15 ` Stefano Stabellini 2022-04-29 16:47 ` Julien Grall 0 siblings, 1 reply; 12+ messages in thread From: Stefano Stabellini @ 2022-04-29 16:15 UTC (permalink / raw) To: Julien Grall Cc: Stefano Stabellini, Julien Grall, osstest service owner, xen-devel, Jan Beulich, David Vrabel, Bertrand Marquis On Fri, 29 Apr 2022, Julien Grall wrote: > On 29/04/2022 01:41, Stefano Stabellini wrote: > > On Thu, 28 Apr 2022, Julien Grall wrote: > > > On 28/04/2022 01:47, Stefano Stabellini wrote: > > > > On Thu, 28 Apr 2022, Julien Grall wrote: > > > > > Hi Stefano, > > > > > > > > > > On Thu, 28 Apr 2022, 00:02 Stefano Stabellini, > > > > > <sstabellini@kernel.org> > > > > > wrote > > > > > It seems to me that it is acceptable to allocate memory with > > > > > interrupt > > > > > disabled during __init. I cannot see any drawbacks with it. I > > > > > think > > > > > we > > > > > should change the ASSERT to only trigger after __init: > > > > > system_state > > > > > == > > > > > SYS_STATE_active. > > > > > > > > > > What do you think? > > > > > > > > > > > > > > > This would solve the immediate problem but not the long term one (i.e > > > > > cpu > > > > > hotplug). > > > > > > > > > > So I think it would be better to properly fix it right away. > > > > > > > > Yeah, you are right about cpu hotplug. I think both statements are true: > > > > > > > > - it is true that this is supposed to work with cpu hotplug and these > > > > functions might be directly affected by cpu hotplug (by a CPU coming > > > > online later on) > > > > > > > > - it is also true that it might not make sense to ASSERT at __init time > > > > if IRQs are disabled. There might be other places, not affected by > > > > cpu > > > > hotplug, where we do memory allocation at __init time with IRQ > > > > disabled. It might still be a good idea to add the system_state == > > > > SYS_STATE_active check in the ASSERT, not to solve this specific > > > > problem but to avoid other issues. > > > > > > AFAIU, it is not safe on x86 to do TLB flush with interrupts disabled > > > *and* > > > multiple CPUs running. So we can't generically relax the check. > > > > > > Looking at the OSSTest results, both Arm32 and Arm64 without GICv3 ITS > > > tests > > > have passed. So it seems unnecessary to me to preemptively relax the check > > > just for Arm. > > > > It is good news that it works already (GICv3 aside) on ARM. If you > > prefer not to relax it, I am OK with it (although it makes me a bit > > worried about future breakages). > > Bear in mind this is a debug only breakage, production build will work fines > with any ASSERT() affecting large code base, it is going to be difficult to > find all the potential misuse. So we have to rely on wider testing and fix it > as it gets reported. > > If we relax the check, then we are never going to be able to harden the code > in timely maneer. > > > > > In regard to gicv3_lpi_allocate_pendtable, I haven't thought about the > > > > implications of cpu hotplug for LPIs and GICv3 before. Do you envision > > > > that in a CPU hotplug scenario gicv3_lpi_init_rdist would be called when > > > > the extra CPU comes online? > > > > > > It is already called per-CPU. See gicv3_secondary_cpu_init() -> > > > gicv3_cpu_init() -> gicv3_populate_rdist(). > > > > Got it, thanks! > > > > > > > > Today gicv3_lpi_init_rdist is called based on the number of > > > > rdist_regions without checking if the CPU is online or offline (I think > > > > ?) > > > > > > The re-distributors are not banked and therefore accessible by everyone. > > > However, in Xen case, each pCPU will only touch its own re-distributor > > > (well > > > aside TYPER to figure out the ID). > > > > > > The loop in gicv3_populate_rdist() will walk throught all the > > > re-distributor to find which one corresponds to the current pCPU. Once we > > > found it, we will call gicv3_lpi_init_rdist() to fully initialize the > > > re-distributor. > > > > > > I don't think we want to populate the memory for each re-distributor in > > > advance. > > > > I agree. > > > > Currently we do: > > > > start_secondary > > [...] > > gic_init_secondary_cpu() > > [...] > > gicv3_lpi_init_rdist() > > [...] > > local_irq_enable(); > > > > Which seems to be the right sequence to me. There must be an early boot > > phase where interrupts are disabled on a CPU but memory allocations are > > possible. If this was x86 with the tlbflush limitation, I would suggest > > to have per-cpu memory mapping areas so that we don't have to do any > > global tlb flushes with interrupts disabled. > > > > On ARM, we don't have the tlbflush limitation so we could do that but we > > wouldn't have much to gain from it. > > > > Also, this seems to be a bit of a special case, because in general we > > can move drivers initializations later after local_irq_enable(). But > > this is the interrupt controller driver itself -- we cannot move it > > after local_irq_enable(). > > > > So maybe an ad-hoc solution could be acceptable? > > We don't need any ad-hoc solution here. We can register a CPU notifier that > will notify us when a CPU will be prepared. Something like below should work > (untested yet): The CPU notifier is a good idea. It looks like the patch below got corrupted somehow by the mailer. If you send it as a proper patch I am happy to have a look. > diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c > index e1594dd20e4c..ccf4868540f5 100644 > --- a/xen/arch/arm/gic-v3-lpi.c > +++ b/xen/arch/arm/gic-v3-lpi.c > @@ -18,6 +18,7 @@ > * along with this program; If not, see <http://www.gnu.org/licenses/>. > */ > > +#include <xen/cpu.h> > #include <xen/lib.h> > #include <xen/mm.h> > #include <xen/param.h> > @@ -234,18 +235,13 @@ void gicv3_lpi_update_host_entry(uint32_t host_lpi, int > domain_id, > write_u64_atomic(&hlpip->data, hlpi.data); > } > > -static int gicv3_lpi_allocate_pendtable(uint64_t *reg) > +static int gicv3_lpi_allocate_pendtable(unsigned int cpu) > { > - uint64_t val; > void *pendtable; > > - if ( this_cpu(lpi_redist).pending_table ) > + if ( per_cpu(lpi_redist, cpu).pending_table ) > return -EBUSY; > > - val = GIC_BASER_CACHE_RaWaWb << GICR_PENDBASER_INNER_CACHEABILITY_SHIFT; > - val |= GIC_BASER_CACHE_SameAsInner << > GICR_PENDBASER_OUTER_CACHEABILITY_SHIFT; > - val |= GIC_BASER_InnerShareable << GICR_PENDBASER_SHAREABILITY_SHIFT; > - > /* > * The pending table holds one bit per LPI and even covers bits for > * interrupt IDs below 8192, so we allocate the full range. > @@ -265,13 +261,38 @@ static int gicv3_lpi_allocate_pendtable(uint64_t *reg) > clean_and_invalidate_dcache_va_range(pendtable, > lpi_data.max_host_lpi_ids / 8); > > - this_cpu(lpi_redist).pending_table = pendtable; > + per_cpu(lpi_redist, cpu).pending_table = pendtable; > > - val |= GICR_PENDBASER_PTZ; > + return 0; > +} > + > +static int gicv3_lpi_set_pendtable(void __iomem *rdist_base) > +{ > + const void *pendtable = this_cpu(lpi_redist).pending_table; > + uint64_t val; > + > + if ( !pendtable ) > + return -ENOMEM; > > + ASSERT(virt_to_maddr(pendtable) & ~GENMASK(51, 16)); > + > + val = GIC_BASER_CACHE_RaWaWb << GICR_PENDBASER_INNER_CACHEABILITY_SHIFT; > + val |= GIC_BASER_CACHE_SameAsInner << > GICR_PENDBASER_OUTER_CACHEABILITY_SHIFT; > + val |= GIC_BASER_InnerShareable << GICR_PENDBASER_SHAREABILITY_SHIFT; > + val |= GICR_PENDBASER_PTZ; > val |= virt_to_maddr(pendtable); > > - *reg = val; > + writeq_relaxed(val, rdist_base + GICR_PENDBASER); > + val = readq_relaxed(rdist_base + GICR_PENDBASER); > + > + /* If the hardware reports non-shareable, drop cacheability as well. */ > + if ( !(val & GICR_PENDBASER_SHAREABILITY_MASK) ) > + { > + val &= ~GICR_PENDBASER_INNER_CACHEABILITY_MASK; > + val |= GIC_BASER_CACHE_nC << GICR_PENDBASER_INNER_CACHEABILITY_SHIFT; > + > + writeq_relaxed(val, rdist_base + GICR_PENDBASER); > + } > > return 0; > } > @@ -340,7 +361,6 @@ static int gicv3_lpi_set_proptable(void __iomem * > rdist_base) > int gicv3_lpi_init_rdist(void __iomem * rdist_base) > { > uint32_t reg; > - uint64_t table_reg; > int ret; > > /* We don't support LPIs without an ITS. */ > @@ -352,24 +372,33 @@ int gicv3_lpi_init_rdist(void __iomem * rdist_base) > if ( reg & GICR_CTLR_ENABLE_LPIS ) > return -EBUSY; > > - ret = gicv3_lpi_allocate_pendtable(&table_reg); > + ret = gicv3_lpi_set_pendtable(rdist_base); > if ( ret ) > return ret; > - writeq_relaxed(table_reg, rdist_base + GICR_PENDBASER); > - table_reg = readq_relaxed(rdist_base + GICR_PENDBASER); > > - /* If the hardware reports non-shareable, drop cacheability as well. */ > - if ( !(table_reg & GICR_PENDBASER_SHAREABILITY_MASK) ) > - { > - table_reg &= ~GICR_PENDBASER_INNER_CACHEABILITY_MASK; > - table_reg |= GIC_BASER_CACHE_nC << > GICR_PENDBASER_INNER_CACHEABILITY_SHIFT; > + return gicv3_lpi_set_proptable(rdist_base); > +} > + > +static int cpu_callback(struct notifier_block *nfb, unsigned long action, > + void *hcpu) > +{ > + unsigned long cpu = (unsigned long)hcpu; > + int rc = 0; > > - writeq_relaxed(table_reg, rdist_base + GICR_PENDBASER); > + switch ( action ) > + { > + case CPU_UP_PREPARE: > + rc = gicv3_lpi_allocate_pendtable(cpu); > + break; > } > > - return gicv3_lpi_set_proptable(rdist_base); > + return !rc ? NOTIFY_DONE : notifier_from_errno(rc); > } > > +static struct notifier_block cpu_nfb = { > + .notifier_call = cpu_callback, > +}; > + > static unsigned int max_lpi_bits = 20; > integer_param("max_lpi_bits", max_lpi_bits); > > @@ -381,6 +410,7 @@ integer_param("max_lpi_bits", max_lpi_bits); > int gicv3_lpi_init_host_lpis(unsigned int host_lpi_bits) > { > unsigned int nr_lpi_ptrs; > + int rc; > > /* We rely on the data structure being atomically accessible. */ > BUILD_BUG_ON(sizeof(union host_lpi) > sizeof(unsigned long)); > @@ -413,7 +443,14 @@ int gicv3_lpi_init_host_lpis(unsigned int host_lpi_bits) > > printk("GICv3: using at most %lu LPIs on the host.\n", MAX_NR_HOST_LPIS); > > - return 0; > + /* Register the CPU notifier and allocate memory for the boot CPU */ > + register_cpu_notifier(&cpu_nfb); > + rc = gicv3_lpi_allocate_pendtable(smp_processor_id()); > + if ( rc ) > + printk(XENLOG_ERR "Unable to allocate the pendtable for CPU%u\n", > + smp_processor_id()); > + > + return rc; > } > > static int find_unused_host_lpi(uint32_t start, uint32_t *index) > > Cheers, > > -- > Julien Grall > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL 2022-04-29 16:15 ` Stefano Stabellini @ 2022-04-29 16:47 ` Julien Grall 0 siblings, 0 replies; 12+ messages in thread From: Julien Grall @ 2022-04-29 16:47 UTC (permalink / raw) To: Stefano Stabellini Cc: Julien Grall, osstest service owner, xen-devel, Jan Beulich, David Vrabel, Bertrand Marquis On 29/04/2022 17:15, Stefano Stabellini wrote: >>> Which seems to be the right sequence to me. There must be an early boot >>> phase where interrupts are disabled on a CPU but memory allocations are >>> possible. If this was x86 with the tlbflush limitation, I would suggest >>> to have per-cpu memory mapping areas so that we don't have to do any >>> global tlb flushes with interrupts disabled. >>> >>> On ARM, we don't have the tlbflush limitation so we could do that but we >>> wouldn't have much to gain from it. >>> >>> Also, this seems to be a bit of a special case, because in general we >>> can move drivers initializations later after local_irq_enable(). But >>> this is the interrupt controller driver itself -- we cannot move it >>> after local_irq_enable(). >>> >>> So maybe an ad-hoc solution could be acceptable? >> >> We don't need any ad-hoc solution here. We can register a CPU notifier that >> will notify us when a CPU will be prepared. Something like below should work >> (untested yet): > > The CPU notifier is a good idea. It looks like the patch below got > corrupted somehow by the mailer. If you send it as a proper patch I am > happy to have a look. Doh. I will send a proper patch once I have done some testing. Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL 2022-04-27 17:10 ` Julien Grall 2022-04-27 23:02 ` Stefano Stabellini @ 2022-04-28 7:45 ` Jan Beulich 2022-04-28 8:45 ` Julien Grall 1 sibling, 1 reply; 12+ messages in thread From: Jan Beulich @ 2022-04-28 7:45 UTC (permalink / raw) To: Julien Grall, osstest service owner Cc: David Vrabel, Stefano Stabellini, Bertrand Marquis, xen-devel On 27.04.2022 19:10, Julien Grall wrote: > Hi, > > On 27/04/2022 17:38, osstest service owner wrote: >> flight 169781 xen-unstable-smoke real [real] >> flight 169785 xen-unstable-smoke real-retest [real] >> http://logs.test-lab.xenproject.org/osstest/logs/169781/ >> http://logs.test-lab.xenproject.org/osstest/logs/169785/ >> >> Regressions :-( >> >> Tests which did not succeed and are blocking, >> including tests which could not be run: >> test-arm64-arm64-xl-xsm 8 xen-boot fail REGR. vs. 169773 > > Well, I was overly optimistic :(. This now breaks in the ITS code: > > Apr 27 13:23:14.324831 (XEN) Xen call trace: > Apr 27 13:23:14.324855 (XEN) [<000000000022a678>] > alloc_xenheap_pages+0x178/0x194 (PC) > Apr 27 13:23:14.336856 (XEN) [<000000000022a670>] > alloc_xenheap_pages+0x170/0x194 (LR) > Apr 27 13:23:14.336886 (XEN) [<0000000000237770>] _xmalloc+0x144/0x294 > Apr 27 13:23:14.348773 (XEN) [<00000000002378d4>] _xzalloc+0x14/0x30 > Apr 27 13:23:14.348808 (XEN) [<000000000027b4e4>] > gicv3_lpi_init_rdist+0x54/0x324 > Apr 27 13:23:14.348835 (XEN) [<0000000000279898>] > arch/arm/gic-v3.c#gicv3_cpu_init+0x128/0x46c > Apr 27 13:23:14.360799 (XEN) [<0000000000279bfc>] > arch/arm/gic-v3.c#gicv3_secondary_cpu_init+0x20/0x50 > Apr 27 13:23:14.372796 (XEN) [<0000000000277054>] > gic_init_secondary_cpu+0x18/0x30 > Apr 27 13:23:14.372829 (XEN) [<0000000000284518>] > start_secondary+0x1a8/0x234 > Apr 27 13:23:14.372856 (XEN) [<0000010722aa4200>] 0000010722aa4200 > Apr 27 13:23:14.384793 (XEN) > Apr 27 13:23:14.384823 (XEN) > Apr 27 13:23:14.384845 (XEN) **************************************** > Apr 27 13:23:14.384869 (XEN) Panic on CPU 2: > Apr 27 13:23:14.384891 (XEN) Assertion '!in_irq() && > (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at > common/page_alloc.c:2212 > Apr 27 13:23:14.396805 (XEN) **************************************** > > The GICv3 LPI code contains a few calls to xmalloc() that will be done > while initializing the GIC CPU interface. I don't think we can delay the > initialization of the LPI part past local_irq_enable(). So I think we > will need to allocate the memory when preparing the CPU. Do you have an explanation why the next flight (169800) passed? Jan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL 2022-04-28 7:45 ` Jan Beulich @ 2022-04-28 8:45 ` Julien Grall 0 siblings, 0 replies; 12+ messages in thread From: Julien Grall @ 2022-04-28 8:45 UTC (permalink / raw) To: Jan Beulich, osstest service owner Cc: David Vrabel, Stefano Stabellini, Bertrand Marquis, xen-devel Hi Jan, On 28/04/2022 08:45, Jan Beulich wrote: > On 27.04.2022 19:10, Julien Grall wrote: >> Hi, >> >> On 27/04/2022 17:38, osstest service owner wrote: >>> flight 169781 xen-unstable-smoke real [real] >>> flight 169785 xen-unstable-smoke real-retest [real] >>> http://logs.test-lab.xenproject.org/osstest/logs/169781/ >>> http://logs.test-lab.xenproject.org/osstest/logs/169785/ >>> >>> Regressions :-( >>> >>> Tests which did not succeed and are blocking, >>> including tests which could not be run: >>> test-arm64-arm64-xl-xsm 8 xen-boot fail REGR. vs. 169773 >> >> Well, I was overly optimistic :(. This now breaks in the ITS code: >> >> Apr 27 13:23:14.324831 (XEN) Xen call trace: >> Apr 27 13:23:14.324855 (XEN) [<000000000022a678>] >> alloc_xenheap_pages+0x178/0x194 (PC) >> Apr 27 13:23:14.336856 (XEN) [<000000000022a670>] >> alloc_xenheap_pages+0x170/0x194 (LR) >> Apr 27 13:23:14.336886 (XEN) [<0000000000237770>] _xmalloc+0x144/0x294 >> Apr 27 13:23:14.348773 (XEN) [<00000000002378d4>] _xzalloc+0x14/0x30 >> Apr 27 13:23:14.348808 (XEN) [<000000000027b4e4>] >> gicv3_lpi_init_rdist+0x54/0x324 >> Apr 27 13:23:14.348835 (XEN) [<0000000000279898>] >> arch/arm/gic-v3.c#gicv3_cpu_init+0x128/0x46c >> Apr 27 13:23:14.360799 (XEN) [<0000000000279bfc>] >> arch/arm/gic-v3.c#gicv3_secondary_cpu_init+0x20/0x50 >> Apr 27 13:23:14.372796 (XEN) [<0000000000277054>] >> gic_init_secondary_cpu+0x18/0x30 >> Apr 27 13:23:14.372829 (XEN) [<0000000000284518>] >> start_secondary+0x1a8/0x234 >> Apr 27 13:23:14.372856 (XEN) [<0000010722aa4200>] 0000010722aa4200 >> Apr 27 13:23:14.384793 (XEN) >> Apr 27 13:23:14.384823 (XEN) >> Apr 27 13:23:14.384845 (XEN) **************************************** >> Apr 27 13:23:14.384869 (XEN) Panic on CPU 2: >> Apr 27 13:23:14.384891 (XEN) Assertion '!in_irq() && >> (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at >> common/page_alloc.c:2212 >> Apr 27 13:23:14.396805 (XEN) **************************************** >> >> The GICv3 LPI code contains a few calls to xmalloc() that will be done >> while initializing the GIC CPU interface. I don't think we can delay the >> initialization of the LPI part past local_irq_enable(). So I think we >> will need to allocate the memory when preparing the CPU. > > Do you have an explanation why the next flight (169800) passed? The flight 169800 ran on laxtonX (Softiron) which doesn't have a GICv3 ITS. I thought OSSTest would try to run the next flight on the same HW to check heisenbug, but maybe this doesn't happen for the smoke test? Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-04-29 16:47 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-04-27 16:38 [xen-unstable-smoke test] 169781: regressions - FAIL osstest service owner 2022-04-27 17:10 ` Julien Grall 2022-04-27 23:02 ` Stefano Stabellini 2022-04-27 23:13 ` Julien Grall 2022-04-28 0:47 ` Stefano Stabellini 2022-04-28 11:19 ` Julien Grall 2022-04-29 0:41 ` Stefano Stabellini 2022-04-29 9:04 ` Julien Grall 2022-04-29 16:15 ` Stefano Stabellini 2022-04-29 16:47 ` Julien Grall 2022-04-28 7:45 ` Jan Beulich 2022-04-28 8:45 ` Julien Grall
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.