* [xen-unstable-smoke test] 169781: regressions - FAIL
@ 2022-04-27 16:38 osstest service owner
2022-04-27 17:10 ` Julien Grall
0 siblings, 1 reply; 12+ messages in thread
From: osstest service owner @ 2022-04-27 16:38 UTC (permalink / raw)
To: xen-devel
flight 169781 xen-unstable-smoke real [real]
flight 169785 xen-unstable-smoke real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/169781/
http://logs.test-lab.xenproject.org/osstest/logs/169785/
Regressions :-(
Tests which did not succeed and are blocking,
including tests which could not be run:
test-arm64-arm64-xl-xsm 8 xen-boot fail REGR. vs. 169773
Tests which did not succeed, but are not blocking:
test-amd64-amd64-libvirt 15 migrate-support-check fail never pass
test-armhf-armhf-xl 15 migrate-support-check fail never pass
test-armhf-armhf-xl 16 saverestore-support-check fail never pass
version targeted for testing:
xen fa6dc0879ffd3dffffaea2837953c7a8761a9ba0
baseline version:
xen 163071b1800304c962756789b4ef0ddb978059ba
Last test of basis 169773 2022-04-27 08:01:54 Z 0 days
Testing same since 169781 2022-04-27 12:01:52 Z 0 days 1 attempts
------------------------------------------------------------
People who touched revisions under test:
David Vrabel <dvrabel@amazon.co.uk>
Julien Grall <jgrall@amazon.com>
jobs:
build-arm64-xsm pass
build-amd64 pass
build-armhf pass
build-amd64-libvirt pass
test-armhf-armhf-xl pass
test-arm64-arm64-xl-xsm fail
test-amd64-amd64-xl-qemuu-debianhvm-amd64 pass
test-amd64-amd64-libvirt pass
------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images
Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs
Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master
Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary
Not pushing.
------------------------------------------------------------
commit fa6dc0879ffd3dffffaea2837953c7a8761a9ba0
Author: David Vrabel <dvrabel@amazon.co.uk>
Date: Tue Apr 26 10:33:01 2022 +0200
page_alloc: assert IRQs are enabled in heap alloc/free
Heap pages can only be safely allocated and freed with interrupts
enabled as they may require a TLB flush which may send IPIs (on x86).
Normally spinlock debugging would catch calls from the incorrect
context, but not from stop_machine_run() action functions as these are
called with spin lock debugging disabled.
Enhance the assertions in alloc_xenheap_pages() and
alloc_domheap_pages() to check interrupts are enabled. For consistency
the same asserts are used when freeing heap pages.
As an exception, when only 1 PCPU is online, allocations are permitted
with interrupts disabled as any TLB flushes would be local only. This
is necessary during early boot.
Signed-off-by: David Vrabel <dvrabel@amazon.co.uk>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
commit fbd2445558beff90eb9607308f0845b18a7a2b5a
Author: Julien Grall <jgrall@amazon.com>
Date: Tue Apr 26 21:06:29 2022 +0100
xen/arm: alternative: Don't call vmap() within stop_machine_run()
Commit 88a037e2cfe1 "page_alloc: assert IRQs are enabled in heap
alloc/free" extended the checks in the buddy allocator to catch
any use of the helpers from context with interrupts disabled.
Unfortunately, the rule is not followed in the alternative code and
this will result to crash at boot with debug enabled:
(XEN) Xen call trace:
(XEN) [<0022a510>] alloc_xenheap_pages+0x120/0x150 (PC)
(XEN) [<00000000>] 00000000 (LR)
(XEN) [<002736ac>] arch/arm/mm.c#xen_pt_update+0x144/0x6e4
(XEN) [<002740d4>] map_pages_to_xen+0x10/0x20
(XEN) [<00236864>] __vmap+0x400/0x4a4
(XEN) [<0026aee8>] arch/arm/alternative.c#__apply_alternatives_multi_stop+0x144/0x1ec
(XEN) [<0022fe40>] stop_machine_run+0x23c/0x300
(XEN) [<002c40c4>] apply_alternatives_all+0x34/0x5c
(XEN) [<002ce3e8>] start_xen+0xcb8/0x1024
(XEN) [<00200068>] arch/arm/arm32/head.o#primary_switched+0xc/0x1c
The interrupts will be disabled by the state machine in stop_machine_run(),
hence why the ASSERT is hit.
For now the patch extending the checks has been reverted, but it would
be good to re-introduce it (allocation with interrupts disabled is not
desirable).
So move the re-mapping of Xen to the caller of stop_machine_run().
Signed-off-by: Julien Grall <jgrall@amazon.com>
Cc: David Vrabel <dvrabel@amazon.co.uk>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(qemu changes not included)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
2022-04-27 16:38 [xen-unstable-smoke test] 169781: regressions - FAIL osstest service owner
@ 2022-04-27 17:10 ` Julien Grall
2022-04-27 23:02 ` Stefano Stabellini
2022-04-28 7:45 ` Jan Beulich
0 siblings, 2 replies; 12+ messages in thread
From: Julien Grall @ 2022-04-27 17:10 UTC (permalink / raw)
To: osstest service owner, xen-devel
Cc: Jan Beulich, David Vrabel, Stefano Stabellini, Bertrand Marquis
Hi,
On 27/04/2022 17:38, osstest service owner wrote:
> flight 169781 xen-unstable-smoke real [real]
> flight 169785 xen-unstable-smoke real-retest [real]
> http://logs.test-lab.xenproject.org/osstest/logs/169781/
> http://logs.test-lab.xenproject.org/osstest/logs/169785/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
> test-arm64-arm64-xl-xsm 8 xen-boot fail REGR. vs. 169773
Well, I was overly optimistic :(. This now breaks in the ITS code:
Apr 27 13:23:14.324831 (XEN) Xen call trace:
Apr 27 13:23:14.324855 (XEN) [<000000000022a678>]
alloc_xenheap_pages+0x178/0x194 (PC)
Apr 27 13:23:14.336856 (XEN) [<000000000022a670>]
alloc_xenheap_pages+0x170/0x194 (LR)
Apr 27 13:23:14.336886 (XEN) [<0000000000237770>] _xmalloc+0x144/0x294
Apr 27 13:23:14.348773 (XEN) [<00000000002378d4>] _xzalloc+0x14/0x30
Apr 27 13:23:14.348808 (XEN) [<000000000027b4e4>]
gicv3_lpi_init_rdist+0x54/0x324
Apr 27 13:23:14.348835 (XEN) [<0000000000279898>]
arch/arm/gic-v3.c#gicv3_cpu_init+0x128/0x46c
Apr 27 13:23:14.360799 (XEN) [<0000000000279bfc>]
arch/arm/gic-v3.c#gicv3_secondary_cpu_init+0x20/0x50
Apr 27 13:23:14.372796 (XEN) [<0000000000277054>]
gic_init_secondary_cpu+0x18/0x30
Apr 27 13:23:14.372829 (XEN) [<0000000000284518>]
start_secondary+0x1a8/0x234
Apr 27 13:23:14.372856 (XEN) [<0000010722aa4200>] 0000010722aa4200
Apr 27 13:23:14.384793 (XEN)
Apr 27 13:23:14.384823 (XEN)
Apr 27 13:23:14.384845 (XEN) ****************************************
Apr 27 13:23:14.384869 (XEN) Panic on CPU 2:
Apr 27 13:23:14.384891 (XEN) Assertion '!in_irq() &&
(local_irq_is_enabled() || num_online_cpus() <= 1)' failed at
common/page_alloc.c:2212
Apr 27 13:23:14.396805 (XEN) ****************************************
The GICv3 LPI code contains a few calls to xmalloc() that will be done
while initializing the GIC CPU interface. I don't think we can delay the
initialization of the LPI part past local_irq_enable(). So I think we
will need to allocate the memory when preparing the CPU.
Any thoughts?
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
2022-04-27 17:10 ` Julien Grall
@ 2022-04-27 23:02 ` Stefano Stabellini
2022-04-27 23:13 ` Julien Grall
2022-04-28 7:45 ` Jan Beulich
1 sibling, 1 reply; 12+ messages in thread
From: Stefano Stabellini @ 2022-04-27 23:02 UTC (permalink / raw)
To: Julien Grall
Cc: osstest service owner, xen-devel, Jan Beulich, David Vrabel,
Stefano Stabellini, Bertrand Marquis
On Wed, 27 Apr 2022, Julien Grall wrote:
> On 27/04/2022 17:38, osstest service owner wrote:
> > flight 169781 xen-unstable-smoke real [real]
> > flight 169785 xen-unstable-smoke real-retest [real]
> > http://logs.test-lab.xenproject.org/osstest/logs/169781/
> > http://logs.test-lab.xenproject.org/osstest/logs/169785/
> >
> > Regressions :-(
> >
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> > test-arm64-arm64-xl-xsm 8 xen-boot fail REGR. vs.
> > 169773
>
> Well, I was overly optimistic :(. This now breaks in the ITS code:
>
> Apr 27 13:23:14.324831 (XEN) Xen call trace:
> Apr 27 13:23:14.324855 (XEN) [<000000000022a678>]
> alloc_xenheap_pages+0x178/0x194 (PC)
> Apr 27 13:23:14.336856 (XEN) [<000000000022a670>]
> alloc_xenheap_pages+0x170/0x194 (LR)
> Apr 27 13:23:14.336886 (XEN) [<0000000000237770>] _xmalloc+0x144/0x294
> Apr 27 13:23:14.348773 (XEN) [<00000000002378d4>] _xzalloc+0x14/0x30
> Apr 27 13:23:14.348808 (XEN) [<000000000027b4e4>]
> gicv3_lpi_init_rdist+0x54/0x324
> Apr 27 13:23:14.348835 (XEN) [<0000000000279898>]
> arch/arm/gic-v3.c#gicv3_cpu_init+0x128/0x46c
> Apr 27 13:23:14.360799 (XEN) [<0000000000279bfc>]
> arch/arm/gic-v3.c#gicv3_secondary_cpu_init+0x20/0x50
> Apr 27 13:23:14.372796 (XEN) [<0000000000277054>]
> gic_init_secondary_cpu+0x18/0x30
> Apr 27 13:23:14.372829 (XEN) [<0000000000284518>]
> start_secondary+0x1a8/0x234
> Apr 27 13:23:14.372856 (XEN) [<0000010722aa4200>] 0000010722aa4200
> Apr 27 13:23:14.384793 (XEN)
> Apr 27 13:23:14.384823 (XEN)
> Apr 27 13:23:14.384845 (XEN) ****************************************
> Apr 27 13:23:14.384869 (XEN) Panic on CPU 2:
> Apr 27 13:23:14.384891 (XEN) Assertion '!in_irq() && (local_irq_is_enabled()
> || num_online_cpus() <= 1)' failed at common/page_alloc.c:2212
> Apr 27 13:23:14.396805 (XEN) ****************************************
>
> The GICv3 LPI code contains a few calls to xmalloc() that will be done while
> initializing the GIC CPU interface. I don't think we can delay the
> initialization of the LPI part past local_irq_enable(). So I think we will
> need to allocate the memory when preparing the CPU.
>
> Any thoughts?
As a general principle I think the ASSERT is a good idea, and it should
make the code better and safer. I would not change the code to make the
ASSERT go away if not to improve the code.
In this case, gicv3_lpi_init_rdist and gicv3_lpi_allocate_pendtable
should be __init functions although they are not marked as __init at the
moment.
It seems to me that it is acceptable to allocate memory with interrupt
disabled during __init. I cannot see any drawbacks with it. I think we
should change the ASSERT to only trigger after __init: system_state ==
SYS_STATE_active.
What do you think?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
2022-04-27 23:02 ` Stefano Stabellini
@ 2022-04-27 23:13 ` Julien Grall
2022-04-28 0:47 ` Stefano Stabellini
0 siblings, 1 reply; 12+ messages in thread
From: Julien Grall @ 2022-04-27 23:13 UTC (permalink / raw)
To: Stefano Stabellini
Cc: osstest service owner, xen-devel, Jan Beulich, David Vrabel,
Bertrand Marquis
[-- Attachment #1: Type: text/plain, Size: 515 bytes --]
Hi Stefano,
On Thu, 28 Apr 2022, 00:02 Stefano Stabellini, <sstabellini@kernel.org>
wrote
> It seems to me that it is acceptable to allocate memory with interrupt
> disabled during __init. I cannot see any drawbacks with it. I think we
> should change the ASSERT to only trigger after __init: system_state ==
> SYS_STATE_active.
>
> What do you think?
>
This would solve the immediate problem but not the long term one (i.e cpu
hotplug).
So I think it would be better to properly fix it right away.
Cheers,
>
[-- Attachment #2: Type: text/html, Size: 1165 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
2022-04-27 23:13 ` Julien Grall
@ 2022-04-28 0:47 ` Stefano Stabellini
2022-04-28 11:19 ` Julien Grall
0 siblings, 1 reply; 12+ messages in thread
From: Stefano Stabellini @ 2022-04-28 0:47 UTC (permalink / raw)
To: Julien Grall
Cc: Stefano Stabellini, osstest service owner, xen-devel,
Jan Beulich, David Vrabel, Bertrand Marquis
On Thu, 28 Apr 2022, Julien Grall wrote:
> Hi Stefano,
>
> On Thu, 28 Apr 2022, 00:02 Stefano Stabellini, <sstabellini@kernel.org> wrote
> It seems to me that it is acceptable to allocate memory with interrupt
> disabled during __init. I cannot see any drawbacks with it. I think we
> should change the ASSERT to only trigger after __init: system_state ==
> SYS_STATE_active.
>
> What do you think?
>
>
> This would solve the immediate problem but not the long term one (i.e cpu hotplug).
>
> So I think it would be better to properly fix it right away.
Yeah, you are right about cpu hotplug. I think both statements are true:
- it is true that this is supposed to work with cpu hotplug and these
functions might be directly affected by cpu hotplug (by a CPU coming
online later on)
- it is also true that it might not make sense to ASSERT at __init time
if IRQs are disabled. There might be other places, not affected by cpu
hotplug, where we do memory allocation at __init time with IRQ
disabled. It might still be a good idea to add the system_state ==
SYS_STATE_active check in the ASSERT, not to solve this specific
problem but to avoid other issues.
In regard to gicv3_lpi_allocate_pendtable, I haven't thought about the
implications of cpu hotplug for LPIs and GICv3 before. Do you envision
that in a CPU hotplug scenario gicv3_lpi_init_rdist would be called when
the extra CPU comes online?
Today gicv3_lpi_init_rdist is called based on the number of
rdist_regions without checking if the CPU is online or offline (I think ?)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
2022-04-27 17:10 ` Julien Grall
2022-04-27 23:02 ` Stefano Stabellini
@ 2022-04-28 7:45 ` Jan Beulich
2022-04-28 8:45 ` Julien Grall
1 sibling, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2022-04-28 7:45 UTC (permalink / raw)
To: Julien Grall, osstest service owner
Cc: David Vrabel, Stefano Stabellini, Bertrand Marquis, xen-devel
On 27.04.2022 19:10, Julien Grall wrote:
> Hi,
>
> On 27/04/2022 17:38, osstest service owner wrote:
>> flight 169781 xen-unstable-smoke real [real]
>> flight 169785 xen-unstable-smoke real-retest [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/169781/
>> http://logs.test-lab.xenproject.org/osstest/logs/169785/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>> test-arm64-arm64-xl-xsm 8 xen-boot fail REGR. vs. 169773
>
> Well, I was overly optimistic :(. This now breaks in the ITS code:
>
> Apr 27 13:23:14.324831 (XEN) Xen call trace:
> Apr 27 13:23:14.324855 (XEN) [<000000000022a678>]
> alloc_xenheap_pages+0x178/0x194 (PC)
> Apr 27 13:23:14.336856 (XEN) [<000000000022a670>]
> alloc_xenheap_pages+0x170/0x194 (LR)
> Apr 27 13:23:14.336886 (XEN) [<0000000000237770>] _xmalloc+0x144/0x294
> Apr 27 13:23:14.348773 (XEN) [<00000000002378d4>] _xzalloc+0x14/0x30
> Apr 27 13:23:14.348808 (XEN) [<000000000027b4e4>]
> gicv3_lpi_init_rdist+0x54/0x324
> Apr 27 13:23:14.348835 (XEN) [<0000000000279898>]
> arch/arm/gic-v3.c#gicv3_cpu_init+0x128/0x46c
> Apr 27 13:23:14.360799 (XEN) [<0000000000279bfc>]
> arch/arm/gic-v3.c#gicv3_secondary_cpu_init+0x20/0x50
> Apr 27 13:23:14.372796 (XEN) [<0000000000277054>]
> gic_init_secondary_cpu+0x18/0x30
> Apr 27 13:23:14.372829 (XEN) [<0000000000284518>]
> start_secondary+0x1a8/0x234
> Apr 27 13:23:14.372856 (XEN) [<0000010722aa4200>] 0000010722aa4200
> Apr 27 13:23:14.384793 (XEN)
> Apr 27 13:23:14.384823 (XEN)
> Apr 27 13:23:14.384845 (XEN) ****************************************
> Apr 27 13:23:14.384869 (XEN) Panic on CPU 2:
> Apr 27 13:23:14.384891 (XEN) Assertion '!in_irq() &&
> (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at
> common/page_alloc.c:2212
> Apr 27 13:23:14.396805 (XEN) ****************************************
>
> The GICv3 LPI code contains a few calls to xmalloc() that will be done
> while initializing the GIC CPU interface. I don't think we can delay the
> initialization of the LPI part past local_irq_enable(). So I think we
> will need to allocate the memory when preparing the CPU.
Do you have an explanation why the next flight (169800) passed?
Jan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
2022-04-28 7:45 ` Jan Beulich
@ 2022-04-28 8:45 ` Julien Grall
0 siblings, 0 replies; 12+ messages in thread
From: Julien Grall @ 2022-04-28 8:45 UTC (permalink / raw)
To: Jan Beulich, osstest service owner
Cc: David Vrabel, Stefano Stabellini, Bertrand Marquis, xen-devel
Hi Jan,
On 28/04/2022 08:45, Jan Beulich wrote:
> On 27.04.2022 19:10, Julien Grall wrote:
>> Hi,
>>
>> On 27/04/2022 17:38, osstest service owner wrote:
>>> flight 169781 xen-unstable-smoke real [real]
>>> flight 169785 xen-unstable-smoke real-retest [real]
>>> http://logs.test-lab.xenproject.org/osstest/logs/169781/
>>> http://logs.test-lab.xenproject.org/osstest/logs/169785/
>>>
>>> Regressions :-(
>>>
>>> Tests which did not succeed and are blocking,
>>> including tests which could not be run:
>>> test-arm64-arm64-xl-xsm 8 xen-boot fail REGR. vs. 169773
>>
>> Well, I was overly optimistic :(. This now breaks in the ITS code:
>>
>> Apr 27 13:23:14.324831 (XEN) Xen call trace:
>> Apr 27 13:23:14.324855 (XEN) [<000000000022a678>]
>> alloc_xenheap_pages+0x178/0x194 (PC)
>> Apr 27 13:23:14.336856 (XEN) [<000000000022a670>]
>> alloc_xenheap_pages+0x170/0x194 (LR)
>> Apr 27 13:23:14.336886 (XEN) [<0000000000237770>] _xmalloc+0x144/0x294
>> Apr 27 13:23:14.348773 (XEN) [<00000000002378d4>] _xzalloc+0x14/0x30
>> Apr 27 13:23:14.348808 (XEN) [<000000000027b4e4>]
>> gicv3_lpi_init_rdist+0x54/0x324
>> Apr 27 13:23:14.348835 (XEN) [<0000000000279898>]
>> arch/arm/gic-v3.c#gicv3_cpu_init+0x128/0x46c
>> Apr 27 13:23:14.360799 (XEN) [<0000000000279bfc>]
>> arch/arm/gic-v3.c#gicv3_secondary_cpu_init+0x20/0x50
>> Apr 27 13:23:14.372796 (XEN) [<0000000000277054>]
>> gic_init_secondary_cpu+0x18/0x30
>> Apr 27 13:23:14.372829 (XEN) [<0000000000284518>]
>> start_secondary+0x1a8/0x234
>> Apr 27 13:23:14.372856 (XEN) [<0000010722aa4200>] 0000010722aa4200
>> Apr 27 13:23:14.384793 (XEN)
>> Apr 27 13:23:14.384823 (XEN)
>> Apr 27 13:23:14.384845 (XEN) ****************************************
>> Apr 27 13:23:14.384869 (XEN) Panic on CPU 2:
>> Apr 27 13:23:14.384891 (XEN) Assertion '!in_irq() &&
>> (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at
>> common/page_alloc.c:2212
>> Apr 27 13:23:14.396805 (XEN) ****************************************
>>
>> The GICv3 LPI code contains a few calls to xmalloc() that will be done
>> while initializing the GIC CPU interface. I don't think we can delay the
>> initialization of the LPI part past local_irq_enable(). So I think we
>> will need to allocate the memory when preparing the CPU.
>
> Do you have an explanation why the next flight (169800) passed?
The flight 169800 ran on laxtonX (Softiron) which doesn't have a GICv3 ITS.
I thought OSSTest would try to run the next flight on the same HW to
check heisenbug, but maybe this doesn't happen for the smoke test?
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
2022-04-28 0:47 ` Stefano Stabellini
@ 2022-04-28 11:19 ` Julien Grall
2022-04-29 0:41 ` Stefano Stabellini
0 siblings, 1 reply; 12+ messages in thread
From: Julien Grall @ 2022-04-28 11:19 UTC (permalink / raw)
To: Stefano Stabellini, Julien Grall
Cc: osstest service owner, xen-devel, Jan Beulich, David Vrabel,
Bertrand Marquis
Hi Stefano,
On 28/04/2022 01:47, Stefano Stabellini wrote:
> On Thu, 28 Apr 2022, Julien Grall wrote:
>> Hi Stefano,
>>
>> On Thu, 28 Apr 2022, 00:02 Stefano Stabellini, <sstabellini@kernel.org> wrote
>> It seems to me that it is acceptable to allocate memory with interrupt
>> disabled during __init. I cannot see any drawbacks with it. I think we
>> should change the ASSERT to only trigger after __init: system_state ==
>> SYS_STATE_active.
>>
>> What do you think?
>>
>>
>> This would solve the immediate problem but not the long term one (i.e cpu hotplug).
>>
>> So I think it would be better to properly fix it right away.
>
> Yeah, you are right about cpu hotplug. I think both statements are true:
>
> - it is true that this is supposed to work with cpu hotplug and these
> functions might be directly affected by cpu hotplug (by a CPU coming
> online later on)
>
> - it is also true that it might not make sense to ASSERT at __init time
> if IRQs are disabled. There might be other places, not affected by cpu
> hotplug, where we do memory allocation at __init time with IRQ
> disabled. It might still be a good idea to add the system_state ==
> SYS_STATE_active check in the ASSERT, not to solve this specific
> problem but to avoid other issues.
AFAIU, it is not safe on x86 to do TLB flush with interrupts disabled
*and* multiple CPUs running. So we can't generically relax the check.
Looking at the OSSTest results, both Arm32 and Arm64 without GICv3 ITS
tests have passed. So it seems unnecessary to me to preemptively relax
the check just for Arm.
>
>
> In regard to gicv3_lpi_allocate_pendtable, I haven't thought about the
> implications of cpu hotplug for LPIs and GICv3 before. Do you envision
> that in a CPU hotplug scenario gicv3_lpi_init_rdist would be called when
> the extra CPU comes online?
It is already called per-CPU. See gicv3_secondary_cpu_init() ->
gicv3_cpu_init() -> gicv3_populate_rdist().
>
> Today gicv3_lpi_init_rdist is called based on the number of
> rdist_regions without checking if the CPU is online or offline (I think ?)
The re-distributors are not banked and therefore accessible by everyone.
However, in Xen case, each pCPU will only touch its own re-distributor
(well aside TYPER to figure out the ID).
The loop in gicv3_populate_rdist() will walk throught all the
re-distributor to find which one corresponds to the current pCPU. Once
we found it, we will call gicv3_lpi_init_rdist() to fully initialize the
re-distributor.
I don't think we want to populate the memory for each re-distributor in
advance.
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
2022-04-28 11:19 ` Julien Grall
@ 2022-04-29 0:41 ` Stefano Stabellini
2022-04-29 9:04 ` Julien Grall
0 siblings, 1 reply; 12+ messages in thread
From: Stefano Stabellini @ 2022-04-29 0:41 UTC (permalink / raw)
To: Julien Grall
Cc: Stefano Stabellini, Julien Grall, osstest service owner,
xen-devel, Jan Beulich, David Vrabel, Bertrand Marquis
On Thu, 28 Apr 2022, Julien Grall wrote:
> On 28/04/2022 01:47, Stefano Stabellini wrote:
> > On Thu, 28 Apr 2022, Julien Grall wrote:
> > > Hi Stefano,
> > >
> > > On Thu, 28 Apr 2022, 00:02 Stefano Stabellini, <sstabellini@kernel.org>
> > > wrote
> > > It seems to me that it is acceptable to allocate memory with
> > > interrupt
> > > disabled during __init. I cannot see any drawbacks with it. I think
> > > we
> > > should change the ASSERT to only trigger after __init: system_state
> > > ==
> > > SYS_STATE_active.
> > >
> > > What do you think?
> > >
> > >
> > > This would solve the immediate problem but not the long term one (i.e cpu
> > > hotplug).
> > >
> > > So I think it would be better to properly fix it right away.
> >
> > Yeah, you are right about cpu hotplug. I think both statements are true:
> >
> > - it is true that this is supposed to work with cpu hotplug and these
> > functions might be directly affected by cpu hotplug (by a CPU coming
> > online later on)
> >
> > - it is also true that it might not make sense to ASSERT at __init time
> > if IRQs are disabled. There might be other places, not affected by cpu
> > hotplug, where we do memory allocation at __init time with IRQ
> > disabled. It might still be a good idea to add the system_state ==
> > SYS_STATE_active check in the ASSERT, not to solve this specific
> > problem but to avoid other issues.
>
> AFAIU, it is not safe on x86 to do TLB flush with interrupts disabled *and*
> multiple CPUs running. So we can't generically relax the check.
>
> Looking at the OSSTest results, both Arm32 and Arm64 without GICv3 ITS tests
> have passed. So it seems unnecessary to me to preemptively relax the check
> just for Arm.
It is good news that it works already (GICv3 aside) on ARM. If you
prefer not to relax it, I am OK with it (although it makes me a bit
worried about future breakages).
> > In regard to gicv3_lpi_allocate_pendtable, I haven't thought about the
> > implications of cpu hotplug for LPIs and GICv3 before. Do you envision
> > that in a CPU hotplug scenario gicv3_lpi_init_rdist would be called when
> > the extra CPU comes online?
>
> It is already called per-CPU. See gicv3_secondary_cpu_init() ->
> gicv3_cpu_init() -> gicv3_populate_rdist().
Got it, thanks!
> > Today gicv3_lpi_init_rdist is called based on the number of
> > rdist_regions without checking if the CPU is online or offline (I think ?)
>
> The re-distributors are not banked and therefore accessible by everyone.
> However, in Xen case, each pCPU will only touch its own re-distributor (well
> aside TYPER to figure out the ID).
>
> The loop in gicv3_populate_rdist() will walk throught all the
> re-distributor to find which one corresponds to the current pCPU. Once we
> found it, we will call gicv3_lpi_init_rdist() to fully initialize the
> re-distributor.
>
> I don't think we want to populate the memory for each re-distributor in
> advance.
I agree.
Currently we do:
start_secondary
[...]
gic_init_secondary_cpu()
[...]
gicv3_lpi_init_rdist()
[...]
local_irq_enable();
Which seems to be the right sequence to me. There must be an early boot
phase where interrupts are disabled on a CPU but memory allocations are
possible. If this was x86 with the tlbflush limitation, I would suggest
to have per-cpu memory mapping areas so that we don't have to do any
global tlb flushes with interrupts disabled.
On ARM, we don't have the tlbflush limitation so we could do that but we
wouldn't have much to gain from it.
Also, this seems to be a bit of a special case, because in general we
can move drivers initializations later after local_irq_enable(). But
this is the interrupt controller driver itself -- we cannot move it
after local_irq_enable().
So maybe an ad-hoc solution could be acceptable?
The only one I can think of is to check on system_state ==
SYS_STATE_active now. In the future for CPU hotplug we could have a
per-CPU system_state, like cpu_system_state, and do a similar check.
I am totally open to other ideas, I couldn't come up with anything
better at the moment.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
2022-04-29 0:41 ` Stefano Stabellini
@ 2022-04-29 9:04 ` Julien Grall
2022-04-29 16:15 ` Stefano Stabellini
0 siblings, 1 reply; 12+ messages in thread
From: Julien Grall @ 2022-04-29 9:04 UTC (permalink / raw)
To: Stefano Stabellini
Cc: Julien Grall, osstest service owner, xen-devel, Jan Beulich,
David Vrabel, Bertrand Marquis
Hi Stefano,
On 29/04/2022 01:41, Stefano Stabellini wrote:
> On Thu, 28 Apr 2022, Julien Grall wrote:
>> On 28/04/2022 01:47, Stefano Stabellini wrote:
>>> On Thu, 28 Apr 2022, Julien Grall wrote:
>>>> Hi Stefano,
>>>>
>>>> On Thu, 28 Apr 2022, 00:02 Stefano Stabellini, <sstabellini@kernel.org>
>>>> wrote
>>>> It seems to me that it is acceptable to allocate memory with
>>>> interrupt
>>>> disabled during __init. I cannot see any drawbacks with it. I think
>>>> we
>>>> should change the ASSERT to only trigger after __init: system_state
>>>> ==
>>>> SYS_STATE_active.
>>>>
>>>> What do you think?
>>>>
>>>>
>>>> This would solve the immediate problem but not the long term one (i.e cpu
>>>> hotplug).
>>>>
>>>> So I think it would be better to properly fix it right away.
>>>
>>> Yeah, you are right about cpu hotplug. I think both statements are true:
>>>
>>> - it is true that this is supposed to work with cpu hotplug and these
>>> functions might be directly affected by cpu hotplug (by a CPU coming
>>> online later on)
>>>
>>> - it is also true that it might not make sense to ASSERT at __init time
>>> if IRQs are disabled. There might be other places, not affected by cpu
>>> hotplug, where we do memory allocation at __init time with IRQ
>>> disabled. It might still be a good idea to add the system_state ==
>>> SYS_STATE_active check in the ASSERT, not to solve this specific
>>> problem but to avoid other issues.
>>
>> AFAIU, it is not safe on x86 to do TLB flush with interrupts disabled *and*
>> multiple CPUs running. So we can't generically relax the check.
>>
>> Looking at the OSSTest results, both Arm32 and Arm64 without GICv3 ITS tests
>> have passed. So it seems unnecessary to me to preemptively relax the check
>> just for Arm.
>
> It is good news that it works already (GICv3 aside) on ARM. If you
> prefer not to relax it, I am OK with it (although it makes me a bit
> worried about future breakages).
Bear in mind this is a debug only breakage, production build will work
fines with any ASSERT() affecting large code base, it is going to be
difficult to find all the potential misuse. So we have to rely on wider
testing and fix it as it gets reported.
If we relax the check, then we are never going to be able to harden the
code in timely maneer.
>>> In regard to gicv3_lpi_allocate_pendtable, I haven't thought about the
>>> implications of cpu hotplug for LPIs and GICv3 before. Do you envision
>>> that in a CPU hotplug scenario gicv3_lpi_init_rdist would be called when
>>> the extra CPU comes online?
>>
>> It is already called per-CPU. See gicv3_secondary_cpu_init() ->
>> gicv3_cpu_init() -> gicv3_populate_rdist().
>
> Got it, thanks!
>
>
>>> Today gicv3_lpi_init_rdist is called based on the number of
>>> rdist_regions without checking if the CPU is online or offline (I think ?)
>>
>> The re-distributors are not banked and therefore accessible by everyone.
>> However, in Xen case, each pCPU will only touch its own re-distributor (well
>> aside TYPER to figure out the ID).
>>
>> The loop in gicv3_populate_rdist() will walk throught all the
>> re-distributor to find which one corresponds to the current pCPU. Once we
>> found it, we will call gicv3_lpi_init_rdist() to fully initialize the
>> re-distributor.
>>
>> I don't think we want to populate the memory for each re-distributor in
>> advance.
>
> I agree.
>
> Currently we do:
>
> start_secondary
> [...]
> gic_init_secondary_cpu()
> [...]
> gicv3_lpi_init_rdist()
> [...]
> local_irq_enable();
>
> Which seems to be the right sequence to me. There must be an early boot
> phase where interrupts are disabled on a CPU but memory allocations are
> possible. If this was x86 with the tlbflush limitation, I would suggest
> to have per-cpu memory mapping areas so that we don't have to do any
> global tlb flushes with interrupts disabled.
>
> On ARM, we don't have the tlbflush limitation so we could do that but we
> wouldn't have much to gain from it.
>
> Also, this seems to be a bit of a special case, because in general we
> can move drivers initializations later after local_irq_enable(). But
> this is the interrupt controller driver itself -- we cannot move it
> after local_irq_enable().
>
> So maybe an ad-hoc solution could be acceptable?
We don't need any ad-hoc solution here. We can register a CPU notifier
that will notify us when a CPU will be prepared. Something like below
should work (untested yet):
diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
index e1594dd20e4c..ccf4868540f5 100644
--- a/xen/arch/arm/gic-v3-lpi.c
+++ b/xen/arch/arm/gic-v3-lpi.c
@@ -18,6 +18,7 @@
* along with this program; If not, see <http://www.gnu.org/licenses/>.
*/
+#include <xen/cpu.h>
#include <xen/lib.h>
#include <xen/mm.h>
#include <xen/param.h>
@@ -234,18 +235,13 @@ void gicv3_lpi_update_host_entry(uint32_t
host_lpi, int domain_id,
write_u64_atomic(&hlpip->data, hlpi.data);
}
-static int gicv3_lpi_allocate_pendtable(uint64_t *reg)
+static int gicv3_lpi_allocate_pendtable(unsigned int cpu)
{
- uint64_t val;
void *pendtable;
- if ( this_cpu(lpi_redist).pending_table )
+ if ( per_cpu(lpi_redist, cpu).pending_table )
return -EBUSY;
- val = GIC_BASER_CACHE_RaWaWb <<
GICR_PENDBASER_INNER_CACHEABILITY_SHIFT;
- val |= GIC_BASER_CACHE_SameAsInner <<
GICR_PENDBASER_OUTER_CACHEABILITY_SHIFT;
- val |= GIC_BASER_InnerShareable << GICR_PENDBASER_SHAREABILITY_SHIFT;
-
/*
* The pending table holds one bit per LPI and even covers bits for
* interrupt IDs below 8192, so we allocate the full range.
@@ -265,13 +261,38 @@ static int gicv3_lpi_allocate_pendtable(uint64_t *reg)
clean_and_invalidate_dcache_va_range(pendtable,
lpi_data.max_host_lpi_ids / 8);
- this_cpu(lpi_redist).pending_table = pendtable;
+ per_cpu(lpi_redist, cpu).pending_table = pendtable;
- val |= GICR_PENDBASER_PTZ;
+ return 0;
+}
+
+static int gicv3_lpi_set_pendtable(void __iomem *rdist_base)
+{
+ const void *pendtable = this_cpu(lpi_redist).pending_table;
+ uint64_t val;
+
+ if ( !pendtable )
+ return -ENOMEM;
+ ASSERT(virt_to_maddr(pendtable) & ~GENMASK(51, 16));
+
+ val = GIC_BASER_CACHE_RaWaWb <<
GICR_PENDBASER_INNER_CACHEABILITY_SHIFT;
+ val |= GIC_BASER_CACHE_SameAsInner <<
GICR_PENDBASER_OUTER_CACHEABILITY_SHIFT;
+ val |= GIC_BASER_InnerShareable << GICR_PENDBASER_SHAREABILITY_SHIFT;
+ val |= GICR_PENDBASER_PTZ;
val |= virt_to_maddr(pendtable);
- *reg = val;
+ writeq_relaxed(val, rdist_base + GICR_PENDBASER);
+ val = readq_relaxed(rdist_base + GICR_PENDBASER);
+
+ /* If the hardware reports non-shareable, drop cacheability as well. */
+ if ( !(val & GICR_PENDBASER_SHAREABILITY_MASK) )
+ {
+ val &= ~GICR_PENDBASER_INNER_CACHEABILITY_MASK;
+ val |= GIC_BASER_CACHE_nC <<
GICR_PENDBASER_INNER_CACHEABILITY_SHIFT;
+
+ writeq_relaxed(val, rdist_base + GICR_PENDBASER);
+ }
return 0;
}
@@ -340,7 +361,6 @@ static int gicv3_lpi_set_proptable(void __iomem *
rdist_base)
int gicv3_lpi_init_rdist(void __iomem * rdist_base)
{
uint32_t reg;
- uint64_t table_reg;
int ret;
/* We don't support LPIs without an ITS. */
@@ -352,24 +372,33 @@ int gicv3_lpi_init_rdist(void __iomem * rdist_base)
if ( reg & GICR_CTLR_ENABLE_LPIS )
return -EBUSY;
- ret = gicv3_lpi_allocate_pendtable(&table_reg);
+ ret = gicv3_lpi_set_pendtable(rdist_base);
if ( ret )
return ret;
- writeq_relaxed(table_reg, rdist_base + GICR_PENDBASER);
- table_reg = readq_relaxed(rdist_base + GICR_PENDBASER);
- /* If the hardware reports non-shareable, drop cacheability as well. */
- if ( !(table_reg & GICR_PENDBASER_SHAREABILITY_MASK) )
- {
- table_reg &= ~GICR_PENDBASER_INNER_CACHEABILITY_MASK;
- table_reg |= GIC_BASER_CACHE_nC <<
GICR_PENDBASER_INNER_CACHEABILITY_SHIFT;
+ return gicv3_lpi_set_proptable(rdist_base);
+}
+
+static int cpu_callback(struct notifier_block *nfb, unsigned long action,
+ void *hcpu)
+{
+ unsigned long cpu = (unsigned long)hcpu;
+ int rc = 0;
- writeq_relaxed(table_reg, rdist_base + GICR_PENDBASER);
+ switch ( action )
+ {
+ case CPU_UP_PREPARE:
+ rc = gicv3_lpi_allocate_pendtable(cpu);
+ break;
}
- return gicv3_lpi_set_proptable(rdist_base);
+ return !rc ? NOTIFY_DONE : notifier_from_errno(rc);
}
+static struct notifier_block cpu_nfb = {
+ .notifier_call = cpu_callback,
+};
+
static unsigned int max_lpi_bits = 20;
integer_param("max_lpi_bits", max_lpi_bits);
@@ -381,6 +410,7 @@ integer_param("max_lpi_bits", max_lpi_bits);
int gicv3_lpi_init_host_lpis(unsigned int host_lpi_bits)
{
unsigned int nr_lpi_ptrs;
+ int rc;
/* We rely on the data structure being atomically accessible. */
BUILD_BUG_ON(sizeof(union host_lpi) > sizeof(unsigned long));
@@ -413,7 +443,14 @@ int gicv3_lpi_init_host_lpis(unsigned int
host_lpi_bits)
printk("GICv3: using at most %lu LPIs on the host.\n",
MAX_NR_HOST_LPIS);
- return 0;
+ /* Register the CPU notifier and allocate memory for the boot CPU */
+ register_cpu_notifier(&cpu_nfb);
+ rc = gicv3_lpi_allocate_pendtable(smp_processor_id());
+ if ( rc )
+ printk(XENLOG_ERR "Unable to allocate the pendtable for CPU%u\n",
+ smp_processor_id());
+
+ return rc;
}
static int find_unused_host_lpi(uint32_t start, uint32_t *index)
Cheers,
--
Julien Grall
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
2022-04-29 9:04 ` Julien Grall
@ 2022-04-29 16:15 ` Stefano Stabellini
2022-04-29 16:47 ` Julien Grall
0 siblings, 1 reply; 12+ messages in thread
From: Stefano Stabellini @ 2022-04-29 16:15 UTC (permalink / raw)
To: Julien Grall
Cc: Stefano Stabellini, Julien Grall, osstest service owner,
xen-devel, Jan Beulich, David Vrabel, Bertrand Marquis
On Fri, 29 Apr 2022, Julien Grall wrote:
> On 29/04/2022 01:41, Stefano Stabellini wrote:
> > On Thu, 28 Apr 2022, Julien Grall wrote:
> > > On 28/04/2022 01:47, Stefano Stabellini wrote:
> > > > On Thu, 28 Apr 2022, Julien Grall wrote:
> > > > > Hi Stefano,
> > > > >
> > > > > On Thu, 28 Apr 2022, 00:02 Stefano Stabellini,
> > > > > <sstabellini@kernel.org>
> > > > > wrote
> > > > > It seems to me that it is acceptable to allocate memory with
> > > > > interrupt
> > > > > disabled during __init. I cannot see any drawbacks with it. I
> > > > > think
> > > > > we
> > > > > should change the ASSERT to only trigger after __init:
> > > > > system_state
> > > > > ==
> > > > > SYS_STATE_active.
> > > > >
> > > > > What do you think?
> > > > >
> > > > >
> > > > > This would solve the immediate problem but not the long term one (i.e
> > > > > cpu
> > > > > hotplug).
> > > > >
> > > > > So I think it would be better to properly fix it right away.
> > > >
> > > > Yeah, you are right about cpu hotplug. I think both statements are true:
> > > >
> > > > - it is true that this is supposed to work with cpu hotplug and these
> > > > functions might be directly affected by cpu hotplug (by a CPU coming
> > > > online later on)
> > > >
> > > > - it is also true that it might not make sense to ASSERT at __init time
> > > > if IRQs are disabled. There might be other places, not affected by
> > > > cpu
> > > > hotplug, where we do memory allocation at __init time with IRQ
> > > > disabled. It might still be a good idea to add the system_state ==
> > > > SYS_STATE_active check in the ASSERT, not to solve this specific
> > > > problem but to avoid other issues.
> > >
> > > AFAIU, it is not safe on x86 to do TLB flush with interrupts disabled
> > > *and*
> > > multiple CPUs running. So we can't generically relax the check.
> > >
> > > Looking at the OSSTest results, both Arm32 and Arm64 without GICv3 ITS
> > > tests
> > > have passed. So it seems unnecessary to me to preemptively relax the check
> > > just for Arm.
> >
> > It is good news that it works already (GICv3 aside) on ARM. If you
> > prefer not to relax it, I am OK with it (although it makes me a bit
> > worried about future breakages).
>
> Bear in mind this is a debug only breakage, production build will work fines
> with any ASSERT() affecting large code base, it is going to be difficult to
> find all the potential misuse. So we have to rely on wider testing and fix it
> as it gets reported.
>
> If we relax the check, then we are never going to be able to harden the code
> in timely maneer.
>
> > > > In regard to gicv3_lpi_allocate_pendtable, I haven't thought about the
> > > > implications of cpu hotplug for LPIs and GICv3 before. Do you envision
> > > > that in a CPU hotplug scenario gicv3_lpi_init_rdist would be called when
> > > > the extra CPU comes online?
> > >
> > > It is already called per-CPU. See gicv3_secondary_cpu_init() ->
> > > gicv3_cpu_init() -> gicv3_populate_rdist().
> >
> > Got it, thanks!
> >
> >
> > > > Today gicv3_lpi_init_rdist is called based on the number of
> > > > rdist_regions without checking if the CPU is online or offline (I think
> > > > ?)
> > >
> > > The re-distributors are not banked and therefore accessible by everyone.
> > > However, in Xen case, each pCPU will only touch its own re-distributor
> > > (well
> > > aside TYPER to figure out the ID).
> > >
> > > The loop in gicv3_populate_rdist() will walk throught all the
> > > re-distributor to find which one corresponds to the current pCPU. Once we
> > > found it, we will call gicv3_lpi_init_rdist() to fully initialize the
> > > re-distributor.
> > >
> > > I don't think we want to populate the memory for each re-distributor in
> > > advance.
> >
> > I agree.
> >
> > Currently we do:
> >
> > start_secondary
> > [...]
> > gic_init_secondary_cpu()
> > [...]
> > gicv3_lpi_init_rdist()
> > [...]
> > local_irq_enable();
> >
> > Which seems to be the right sequence to me. There must be an early boot
> > phase where interrupts are disabled on a CPU but memory allocations are
> > possible. If this was x86 with the tlbflush limitation, I would suggest
> > to have per-cpu memory mapping areas so that we don't have to do any
> > global tlb flushes with interrupts disabled.
> >
> > On ARM, we don't have the tlbflush limitation so we could do that but we
> > wouldn't have much to gain from it.
> >
> > Also, this seems to be a bit of a special case, because in general we
> > can move drivers initializations later after local_irq_enable(). But
> > this is the interrupt controller driver itself -- we cannot move it
> > after local_irq_enable().
> >
> > So maybe an ad-hoc solution could be acceptable?
>
> We don't need any ad-hoc solution here. We can register a CPU notifier that
> will notify us when a CPU will be prepared. Something like below should work
> (untested yet):
The CPU notifier is a good idea. It looks like the patch below got
corrupted somehow by the mailer. If you send it as a proper patch I am
happy to have a look.
> diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
> index e1594dd20e4c..ccf4868540f5 100644
> --- a/xen/arch/arm/gic-v3-lpi.c
> +++ b/xen/arch/arm/gic-v3-lpi.c
> @@ -18,6 +18,7 @@
> * along with this program; If not, see <http://www.gnu.org/licenses/>.
> */
>
> +#include <xen/cpu.h>
> #include <xen/lib.h>
> #include <xen/mm.h>
> #include <xen/param.h>
> @@ -234,18 +235,13 @@ void gicv3_lpi_update_host_entry(uint32_t host_lpi, int
> domain_id,
> write_u64_atomic(&hlpip->data, hlpi.data);
> }
>
> -static int gicv3_lpi_allocate_pendtable(uint64_t *reg)
> +static int gicv3_lpi_allocate_pendtable(unsigned int cpu)
> {
> - uint64_t val;
> void *pendtable;
>
> - if ( this_cpu(lpi_redist).pending_table )
> + if ( per_cpu(lpi_redist, cpu).pending_table )
> return -EBUSY;
>
> - val = GIC_BASER_CACHE_RaWaWb << GICR_PENDBASER_INNER_CACHEABILITY_SHIFT;
> - val |= GIC_BASER_CACHE_SameAsInner <<
> GICR_PENDBASER_OUTER_CACHEABILITY_SHIFT;
> - val |= GIC_BASER_InnerShareable << GICR_PENDBASER_SHAREABILITY_SHIFT;
> -
> /*
> * The pending table holds one bit per LPI and even covers bits for
> * interrupt IDs below 8192, so we allocate the full range.
> @@ -265,13 +261,38 @@ static int gicv3_lpi_allocate_pendtable(uint64_t *reg)
> clean_and_invalidate_dcache_va_range(pendtable,
> lpi_data.max_host_lpi_ids / 8);
>
> - this_cpu(lpi_redist).pending_table = pendtable;
> + per_cpu(lpi_redist, cpu).pending_table = pendtable;
>
> - val |= GICR_PENDBASER_PTZ;
> + return 0;
> +}
> +
> +static int gicv3_lpi_set_pendtable(void __iomem *rdist_base)
> +{
> + const void *pendtable = this_cpu(lpi_redist).pending_table;
> + uint64_t val;
> +
> + if ( !pendtable )
> + return -ENOMEM;
>
> + ASSERT(virt_to_maddr(pendtable) & ~GENMASK(51, 16));
> +
> + val = GIC_BASER_CACHE_RaWaWb << GICR_PENDBASER_INNER_CACHEABILITY_SHIFT;
> + val |= GIC_BASER_CACHE_SameAsInner <<
> GICR_PENDBASER_OUTER_CACHEABILITY_SHIFT;
> + val |= GIC_BASER_InnerShareable << GICR_PENDBASER_SHAREABILITY_SHIFT;
> + val |= GICR_PENDBASER_PTZ;
> val |= virt_to_maddr(pendtable);
>
> - *reg = val;
> + writeq_relaxed(val, rdist_base + GICR_PENDBASER);
> + val = readq_relaxed(rdist_base + GICR_PENDBASER);
> +
> + /* If the hardware reports non-shareable, drop cacheability as well. */
> + if ( !(val & GICR_PENDBASER_SHAREABILITY_MASK) )
> + {
> + val &= ~GICR_PENDBASER_INNER_CACHEABILITY_MASK;
> + val |= GIC_BASER_CACHE_nC << GICR_PENDBASER_INNER_CACHEABILITY_SHIFT;
> +
> + writeq_relaxed(val, rdist_base + GICR_PENDBASER);
> + }
>
> return 0;
> }
> @@ -340,7 +361,6 @@ static int gicv3_lpi_set_proptable(void __iomem *
> rdist_base)
> int gicv3_lpi_init_rdist(void __iomem * rdist_base)
> {
> uint32_t reg;
> - uint64_t table_reg;
> int ret;
>
> /* We don't support LPIs without an ITS. */
> @@ -352,24 +372,33 @@ int gicv3_lpi_init_rdist(void __iomem * rdist_base)
> if ( reg & GICR_CTLR_ENABLE_LPIS )
> return -EBUSY;
>
> - ret = gicv3_lpi_allocate_pendtable(&table_reg);
> + ret = gicv3_lpi_set_pendtable(rdist_base);
> if ( ret )
> return ret;
> - writeq_relaxed(table_reg, rdist_base + GICR_PENDBASER);
> - table_reg = readq_relaxed(rdist_base + GICR_PENDBASER);
>
> - /* If the hardware reports non-shareable, drop cacheability as well. */
> - if ( !(table_reg & GICR_PENDBASER_SHAREABILITY_MASK) )
> - {
> - table_reg &= ~GICR_PENDBASER_INNER_CACHEABILITY_MASK;
> - table_reg |= GIC_BASER_CACHE_nC <<
> GICR_PENDBASER_INNER_CACHEABILITY_SHIFT;
> + return gicv3_lpi_set_proptable(rdist_base);
> +}
> +
> +static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> + void *hcpu)
> +{
> + unsigned long cpu = (unsigned long)hcpu;
> + int rc = 0;
>
> - writeq_relaxed(table_reg, rdist_base + GICR_PENDBASER);
> + switch ( action )
> + {
> + case CPU_UP_PREPARE:
> + rc = gicv3_lpi_allocate_pendtable(cpu);
> + break;
> }
>
> - return gicv3_lpi_set_proptable(rdist_base);
> + return !rc ? NOTIFY_DONE : notifier_from_errno(rc);
> }
>
> +static struct notifier_block cpu_nfb = {
> + .notifier_call = cpu_callback,
> +};
> +
> static unsigned int max_lpi_bits = 20;
> integer_param("max_lpi_bits", max_lpi_bits);
>
> @@ -381,6 +410,7 @@ integer_param("max_lpi_bits", max_lpi_bits);
> int gicv3_lpi_init_host_lpis(unsigned int host_lpi_bits)
> {
> unsigned int nr_lpi_ptrs;
> + int rc;
>
> /* We rely on the data structure being atomically accessible. */
> BUILD_BUG_ON(sizeof(union host_lpi) > sizeof(unsigned long));
> @@ -413,7 +443,14 @@ int gicv3_lpi_init_host_lpis(unsigned int host_lpi_bits)
>
> printk("GICv3: using at most %lu LPIs on the host.\n", MAX_NR_HOST_LPIS);
>
> - return 0;
> + /* Register the CPU notifier and allocate memory for the boot CPU */
> + register_cpu_notifier(&cpu_nfb);
> + rc = gicv3_lpi_allocate_pendtable(smp_processor_id());
> + if ( rc )
> + printk(XENLOG_ERR "Unable to allocate the pendtable for CPU%u\n",
> + smp_processor_id());
> +
> + return rc;
> }
>
> static int find_unused_host_lpi(uint32_t start, uint32_t *index)
>
> Cheers,
>
> --
> Julien Grall
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
2022-04-29 16:15 ` Stefano Stabellini
@ 2022-04-29 16:47 ` Julien Grall
0 siblings, 0 replies; 12+ messages in thread
From: Julien Grall @ 2022-04-29 16:47 UTC (permalink / raw)
To: Stefano Stabellini
Cc: Julien Grall, osstest service owner, xen-devel, Jan Beulich,
David Vrabel, Bertrand Marquis
On 29/04/2022 17:15, Stefano Stabellini wrote:
>>> Which seems to be the right sequence to me. There must be an early boot
>>> phase where interrupts are disabled on a CPU but memory allocations are
>>> possible. If this was x86 with the tlbflush limitation, I would suggest
>>> to have per-cpu memory mapping areas so that we don't have to do any
>>> global tlb flushes with interrupts disabled.
>>>
>>> On ARM, we don't have the tlbflush limitation so we could do that but we
>>> wouldn't have much to gain from it.
>>>
>>> Also, this seems to be a bit of a special case, because in general we
>>> can move drivers initializations later after local_irq_enable(). But
>>> this is the interrupt controller driver itself -- we cannot move it
>>> after local_irq_enable().
>>>
>>> So maybe an ad-hoc solution could be acceptable?
>>
>> We don't need any ad-hoc solution here. We can register a CPU notifier that
>> will notify us when a CPU will be prepared. Something like below should work
>> (untested yet):
>
> The CPU notifier is a good idea. It looks like the patch below got
> corrupted somehow by the mailer. If you send it as a proper patch I am
> happy to have a look.
Doh. I will send a proper patch once I have done some testing.
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-04-29 16:47 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-27 16:38 [xen-unstable-smoke test] 169781: regressions - FAIL osstest service owner
2022-04-27 17:10 ` Julien Grall
2022-04-27 23:02 ` Stefano Stabellini
2022-04-27 23:13 ` Julien Grall
2022-04-28 0:47 ` Stefano Stabellini
2022-04-28 11:19 ` Julien Grall
2022-04-29 0:41 ` Stefano Stabellini
2022-04-29 9:04 ` Julien Grall
2022-04-29 16:15 ` Stefano Stabellini
2022-04-29 16:47 ` Julien Grall
2022-04-28 7:45 ` Jan Beulich
2022-04-28 8:45 ` Julien Grall
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.