All of lore.kernel.org
 help / color / mirror / Atom feed
* [xen-unstable-smoke test] 169781: regressions - FAIL
@ 2022-04-27 16:38 osstest service owner
  2022-04-27 17:10 ` Julien Grall
  0 siblings, 1 reply; 12+ messages in thread
From: osstest service owner @ 2022-04-27 16:38 UTC (permalink / raw)
  To: xen-devel

flight 169781 xen-unstable-smoke real [real]
flight 169785 xen-unstable-smoke real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/169781/
http://logs.test-lab.xenproject.org/osstest/logs/169785/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-arm64-arm64-xl-xsm       8 xen-boot                 fail REGR. vs. 169773

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt     15 migrate-support-check        fail   never pass
 test-armhf-armhf-xl          15 migrate-support-check        fail   never pass
 test-armhf-armhf-xl          16 saverestore-support-check    fail   never pass

version targeted for testing:
 xen                  fa6dc0879ffd3dffffaea2837953c7a8761a9ba0
baseline version:
 xen                  163071b1800304c962756789b4ef0ddb978059ba

Last test of basis   169773  2022-04-27 08:01:54 Z    0 days
Testing same since   169781  2022-04-27 12:01:52 Z    0 days    1 attempts

------------------------------------------------------------
People who touched revisions under test:
  David Vrabel <dvrabel@amazon.co.uk>
  Julien Grall <jgrall@amazon.com>

jobs:
 build-arm64-xsm                                              pass    
 build-amd64                                                  pass    
 build-armhf                                                  pass    
 build-amd64-libvirt                                          pass    
 test-armhf-armhf-xl                                          pass    
 test-arm64-arm64-xl-xsm                                      fail    
 test-amd64-amd64-xl-qemuu-debianhvm-amd64                    pass    
 test-amd64-amd64-libvirt                                     pass    


------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
    http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
    http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
    http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

------------------------------------------------------------
commit fa6dc0879ffd3dffffaea2837953c7a8761a9ba0
Author: David Vrabel <dvrabel@amazon.co.uk>
Date:   Tue Apr 26 10:33:01 2022 +0200

    page_alloc: assert IRQs are enabled in heap alloc/free
    
    Heap pages can only be safely allocated and freed with interrupts
    enabled as they may require a TLB flush which may send IPIs (on x86).
    
    Normally spinlock debugging would catch calls from the incorrect
    context, but not from stop_machine_run() action functions as these are
    called with spin lock debugging disabled.
    
    Enhance the assertions in alloc_xenheap_pages() and
    alloc_domheap_pages() to check interrupts are enabled. For consistency
    the same asserts are used when freeing heap pages.
    
    As an exception, when only 1 PCPU is online, allocations are permitted
    with interrupts disabled as any TLB flushes would be local only. This
    is necessary during early boot.
    
    Signed-off-by: David Vrabel <dvrabel@amazon.co.uk>
    Reviewed-by: Jan Beulich <jbeulich@suse.com>

commit fbd2445558beff90eb9607308f0845b18a7a2b5a
Author: Julien Grall <jgrall@amazon.com>
Date:   Tue Apr 26 21:06:29 2022 +0100

    xen/arm: alternative: Don't call vmap() within stop_machine_run()
    
    Commit 88a037e2cfe1 "page_alloc: assert IRQs are enabled in heap
    alloc/free" extended the checks in the buddy allocator to catch
    any use of the helpers from context with interrupts disabled.
    
    Unfortunately, the rule is not followed in the alternative code and
    this will result to crash at boot with debug enabled:
    
    (XEN) Xen call trace:
    (XEN)    [<0022a510>] alloc_xenheap_pages+0x120/0x150 (PC)
    (XEN)    [<00000000>] 00000000 (LR)
    (XEN)    [<002736ac>] arch/arm/mm.c#xen_pt_update+0x144/0x6e4
    (XEN)    [<002740d4>] map_pages_to_xen+0x10/0x20
    (XEN)    [<00236864>] __vmap+0x400/0x4a4
    (XEN)    [<0026aee8>] arch/arm/alternative.c#__apply_alternatives_multi_stop+0x144/0x1ec
    (XEN)    [<0022fe40>] stop_machine_run+0x23c/0x300
    (XEN)    [<002c40c4>] apply_alternatives_all+0x34/0x5c
    (XEN)    [<002ce3e8>] start_xen+0xcb8/0x1024
    (XEN)    [<00200068>] arch/arm/arm32/head.o#primary_switched+0xc/0x1c
    
    The interrupts will be disabled by the state machine in stop_machine_run(),
    hence why the ASSERT is hit.
    
    For now the patch extending the checks has been reverted, but it would
    be good to re-introduce it (allocation with interrupts disabled is not
    desirable).
    
    So move the re-mapping of Xen to the caller of stop_machine_run().
    
    Signed-off-by: Julien Grall <jgrall@amazon.com>
    Cc: David Vrabel <dvrabel@amazon.co.uk>
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(qemu changes not included)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
  2022-04-27 16:38 [xen-unstable-smoke test] 169781: regressions - FAIL osstest service owner
@ 2022-04-27 17:10 ` Julien Grall
  2022-04-27 23:02   ` Stefano Stabellini
  2022-04-28  7:45   ` Jan Beulich
  0 siblings, 2 replies; 12+ messages in thread
From: Julien Grall @ 2022-04-27 17:10 UTC (permalink / raw)
  To: osstest service owner, xen-devel
  Cc: Jan Beulich, David Vrabel, Stefano Stabellini, Bertrand Marquis

Hi,

On 27/04/2022 17:38, osstest service owner wrote:
> flight 169781 xen-unstable-smoke real [real]
> flight 169785 xen-unstable-smoke real-retest [real]
> http://logs.test-lab.xenproject.org/osstest/logs/169781/
> http://logs.test-lab.xenproject.org/osstest/logs/169785/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>   test-arm64-arm64-xl-xsm       8 xen-boot                 fail REGR. vs. 169773

Well, I was overly optimistic :(. This now breaks in the ITS code:

Apr 27 13:23:14.324831 (XEN) Xen call trace:
Apr 27 13:23:14.324855 (XEN)    [<000000000022a678>] 
alloc_xenheap_pages+0x178/0x194 (PC)
Apr 27 13:23:14.336856 (XEN)    [<000000000022a670>] 
alloc_xenheap_pages+0x170/0x194 (LR)
Apr 27 13:23:14.336886 (XEN)    [<0000000000237770>] _xmalloc+0x144/0x294
Apr 27 13:23:14.348773 (XEN)    [<00000000002378d4>] _xzalloc+0x14/0x30
Apr 27 13:23:14.348808 (XEN)    [<000000000027b4e4>] 
gicv3_lpi_init_rdist+0x54/0x324
Apr 27 13:23:14.348835 (XEN)    [<0000000000279898>] 
arch/arm/gic-v3.c#gicv3_cpu_init+0x128/0x46c
Apr 27 13:23:14.360799 (XEN)    [<0000000000279bfc>] 
arch/arm/gic-v3.c#gicv3_secondary_cpu_init+0x20/0x50
Apr 27 13:23:14.372796 (XEN)    [<0000000000277054>] 
gic_init_secondary_cpu+0x18/0x30
Apr 27 13:23:14.372829 (XEN)    [<0000000000284518>] 
start_secondary+0x1a8/0x234
Apr 27 13:23:14.372856 (XEN)    [<0000010722aa4200>] 0000010722aa4200
Apr 27 13:23:14.384793 (XEN)
Apr 27 13:23:14.384823 (XEN)
Apr 27 13:23:14.384845 (XEN) ****************************************
Apr 27 13:23:14.384869 (XEN) Panic on CPU 2:
Apr 27 13:23:14.384891 (XEN) Assertion '!in_irq() && 
(local_irq_is_enabled() || num_online_cpus() <= 1)' failed at 
common/page_alloc.c:2212
Apr 27 13:23:14.396805 (XEN) ****************************************

The GICv3 LPI code contains a few calls to xmalloc() that will be done 
while initializing the GIC CPU interface. I don't think we can delay the 
initialization of the LPI part past local_irq_enable(). So I think we 
will need to allocate the memory when preparing the CPU.

Any thoughts?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
  2022-04-27 17:10 ` Julien Grall
@ 2022-04-27 23:02   ` Stefano Stabellini
  2022-04-27 23:13     ` Julien Grall
  2022-04-28  7:45   ` Jan Beulich
  1 sibling, 1 reply; 12+ messages in thread
From: Stefano Stabellini @ 2022-04-27 23:02 UTC (permalink / raw)
  To: Julien Grall
  Cc: osstest service owner, xen-devel, Jan Beulich, David Vrabel,
	Stefano Stabellini, Bertrand Marquis

On Wed, 27 Apr 2022, Julien Grall wrote:
> On 27/04/2022 17:38, osstest service owner wrote:
> > flight 169781 xen-unstable-smoke real [real]
> > flight 169785 xen-unstable-smoke real-retest [real]
> > http://logs.test-lab.xenproject.org/osstest/logs/169781/
> > http://logs.test-lab.xenproject.org/osstest/logs/169785/
> > 
> > Regressions :-(
> > 
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> >   test-arm64-arm64-xl-xsm       8 xen-boot                 fail REGR. vs.
> > 169773
> 
> Well, I was overly optimistic :(. This now breaks in the ITS code:
> 
> Apr 27 13:23:14.324831 (XEN) Xen call trace:
> Apr 27 13:23:14.324855 (XEN)    [<000000000022a678>]
> alloc_xenheap_pages+0x178/0x194 (PC)
> Apr 27 13:23:14.336856 (XEN)    [<000000000022a670>]
> alloc_xenheap_pages+0x170/0x194 (LR)
> Apr 27 13:23:14.336886 (XEN)    [<0000000000237770>] _xmalloc+0x144/0x294
> Apr 27 13:23:14.348773 (XEN)    [<00000000002378d4>] _xzalloc+0x14/0x30
> Apr 27 13:23:14.348808 (XEN)    [<000000000027b4e4>]
> gicv3_lpi_init_rdist+0x54/0x324
> Apr 27 13:23:14.348835 (XEN)    [<0000000000279898>]
> arch/arm/gic-v3.c#gicv3_cpu_init+0x128/0x46c
> Apr 27 13:23:14.360799 (XEN)    [<0000000000279bfc>]
> arch/arm/gic-v3.c#gicv3_secondary_cpu_init+0x20/0x50
> Apr 27 13:23:14.372796 (XEN)    [<0000000000277054>]
> gic_init_secondary_cpu+0x18/0x30
> Apr 27 13:23:14.372829 (XEN)    [<0000000000284518>]
> start_secondary+0x1a8/0x234
> Apr 27 13:23:14.372856 (XEN)    [<0000010722aa4200>] 0000010722aa4200
> Apr 27 13:23:14.384793 (XEN)
> Apr 27 13:23:14.384823 (XEN)
> Apr 27 13:23:14.384845 (XEN) ****************************************
> Apr 27 13:23:14.384869 (XEN) Panic on CPU 2:
> Apr 27 13:23:14.384891 (XEN) Assertion '!in_irq() && (local_irq_is_enabled()
> || num_online_cpus() <= 1)' failed at common/page_alloc.c:2212
> Apr 27 13:23:14.396805 (XEN) ****************************************
> 
> The GICv3 LPI code contains a few calls to xmalloc() that will be done while
> initializing the GIC CPU interface. I don't think we can delay the
> initialization of the LPI part past local_irq_enable(). So I think we will
> need to allocate the memory when preparing the CPU.
> 
> Any thoughts?


As a general principle I think the ASSERT is a good idea, and it should
make the code better and safer. I would not change the code to make the
ASSERT go away if not to improve the code.

In this case, gicv3_lpi_init_rdist and gicv3_lpi_allocate_pendtable
should be __init functions although they are not marked as __init at the
moment.

It seems to me that it is acceptable to allocate memory with interrupt
disabled during __init. I cannot see any drawbacks with it. I think we
should change the ASSERT to only trigger after __init: system_state ==
SYS_STATE_active.

What do you think?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
  2022-04-27 23:02   ` Stefano Stabellini
@ 2022-04-27 23:13     ` Julien Grall
  2022-04-28  0:47       ` Stefano Stabellini
  0 siblings, 1 reply; 12+ messages in thread
From: Julien Grall @ 2022-04-27 23:13 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: osstest service owner, xen-devel, Jan Beulich, David Vrabel,
	Bertrand Marquis

[-- Attachment #1: Type: text/plain, Size: 515 bytes --]

Hi Stefano,

On Thu, 28 Apr 2022, 00:02 Stefano Stabellini, <sstabellini@kernel.org>
wrote

> It seems to me that it is acceptable to allocate memory with interrupt
> disabled during __init. I cannot see any drawbacks with it. I think we
> should change the ASSERT to only trigger after __init: system_state ==
> SYS_STATE_active.
>
> What do you think?
>

This would solve the immediate problem but not the long term one (i.e cpu
hotplug).

So I think it would be better to properly fix it right away.

Cheers,

>

[-- Attachment #2: Type: text/html, Size: 1165 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
  2022-04-27 23:13     ` Julien Grall
@ 2022-04-28  0:47       ` Stefano Stabellini
  2022-04-28 11:19         ` Julien Grall
  0 siblings, 1 reply; 12+ messages in thread
From: Stefano Stabellini @ 2022-04-28  0:47 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, osstest service owner, xen-devel,
	Jan Beulich, David Vrabel, Bertrand Marquis

On Thu, 28 Apr 2022, Julien Grall wrote:
> Hi Stefano,
> 
> On Thu, 28 Apr 2022, 00:02 Stefano Stabellini, <sstabellini@kernel.org> wrote
>       It seems to me that it is acceptable to allocate memory with interrupt
>       disabled during __init. I cannot see any drawbacks with it. I think we
>       should change the ASSERT to only trigger after __init: system_state ==
>       SYS_STATE_active.
> 
>       What do you think?
> 
> 
> This would solve the immediate problem but not the long term one (i.e cpu hotplug).
> 
> So I think it would be better to properly fix it right away.

Yeah, you are right about cpu hotplug. I think both statements are true:

- it is true that this is supposed to work with cpu hotplug and these
  functions might be directly affected by cpu hotplug (by a CPU coming
  online later on)

- it is also true that it might not make sense to ASSERT at __init time
  if IRQs are disabled. There might be other places, not affected by cpu
  hotplug, where we do memory allocation at __init time with IRQ
  disabled. It might still be a good idea to add the system_state ==
  SYS_STATE_active check in the ASSERT, not to solve this specific
  problem but to avoid other issues.


In regard to gicv3_lpi_allocate_pendtable, I haven't thought about the
implications of cpu hotplug for LPIs and GICv3 before. Do you envision
that in a CPU hotplug scenario gicv3_lpi_init_rdist would be called when
the extra CPU comes online?

Today gicv3_lpi_init_rdist is called based on the number of
rdist_regions without checking if the CPU is online or offline (I think ?)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
  2022-04-27 17:10 ` Julien Grall
  2022-04-27 23:02   ` Stefano Stabellini
@ 2022-04-28  7:45   ` Jan Beulich
  2022-04-28  8:45     ` Julien Grall
  1 sibling, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2022-04-28  7:45 UTC (permalink / raw)
  To: Julien Grall, osstest service owner
  Cc: David Vrabel, Stefano Stabellini, Bertrand Marquis, xen-devel

On 27.04.2022 19:10, Julien Grall wrote:
> Hi,
> 
> On 27/04/2022 17:38, osstest service owner wrote:
>> flight 169781 xen-unstable-smoke real [real]
>> flight 169785 xen-unstable-smoke real-retest [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/169781/
>> http://logs.test-lab.xenproject.org/osstest/logs/169785/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>>   test-arm64-arm64-xl-xsm       8 xen-boot                 fail REGR. vs. 169773
> 
> Well, I was overly optimistic :(. This now breaks in the ITS code:
> 
> Apr 27 13:23:14.324831 (XEN) Xen call trace:
> Apr 27 13:23:14.324855 (XEN)    [<000000000022a678>] 
> alloc_xenheap_pages+0x178/0x194 (PC)
> Apr 27 13:23:14.336856 (XEN)    [<000000000022a670>] 
> alloc_xenheap_pages+0x170/0x194 (LR)
> Apr 27 13:23:14.336886 (XEN)    [<0000000000237770>] _xmalloc+0x144/0x294
> Apr 27 13:23:14.348773 (XEN)    [<00000000002378d4>] _xzalloc+0x14/0x30
> Apr 27 13:23:14.348808 (XEN)    [<000000000027b4e4>] 
> gicv3_lpi_init_rdist+0x54/0x324
> Apr 27 13:23:14.348835 (XEN)    [<0000000000279898>] 
> arch/arm/gic-v3.c#gicv3_cpu_init+0x128/0x46c
> Apr 27 13:23:14.360799 (XEN)    [<0000000000279bfc>] 
> arch/arm/gic-v3.c#gicv3_secondary_cpu_init+0x20/0x50
> Apr 27 13:23:14.372796 (XEN)    [<0000000000277054>] 
> gic_init_secondary_cpu+0x18/0x30
> Apr 27 13:23:14.372829 (XEN)    [<0000000000284518>] 
> start_secondary+0x1a8/0x234
> Apr 27 13:23:14.372856 (XEN)    [<0000010722aa4200>] 0000010722aa4200
> Apr 27 13:23:14.384793 (XEN)
> Apr 27 13:23:14.384823 (XEN)
> Apr 27 13:23:14.384845 (XEN) ****************************************
> Apr 27 13:23:14.384869 (XEN) Panic on CPU 2:
> Apr 27 13:23:14.384891 (XEN) Assertion '!in_irq() && 
> (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at 
> common/page_alloc.c:2212
> Apr 27 13:23:14.396805 (XEN) ****************************************
> 
> The GICv3 LPI code contains a few calls to xmalloc() that will be done 
> while initializing the GIC CPU interface. I don't think we can delay the 
> initialization of the LPI part past local_irq_enable(). So I think we 
> will need to allocate the memory when preparing the CPU.

Do you have an explanation why the next flight (169800) passed?

Jan



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
  2022-04-28  7:45   ` Jan Beulich
@ 2022-04-28  8:45     ` Julien Grall
  0 siblings, 0 replies; 12+ messages in thread
From: Julien Grall @ 2022-04-28  8:45 UTC (permalink / raw)
  To: Jan Beulich, osstest service owner
  Cc: David Vrabel, Stefano Stabellini, Bertrand Marquis, xen-devel

Hi Jan,

On 28/04/2022 08:45, Jan Beulich wrote:
> On 27.04.2022 19:10, Julien Grall wrote:
>> Hi,
>>
>> On 27/04/2022 17:38, osstest service owner wrote:
>>> flight 169781 xen-unstable-smoke real [real]
>>> flight 169785 xen-unstable-smoke real-retest [real]
>>> http://logs.test-lab.xenproject.org/osstest/logs/169781/
>>> http://logs.test-lab.xenproject.org/osstest/logs/169785/
>>>
>>> Regressions :-(
>>>
>>> Tests which did not succeed and are blocking,
>>> including tests which could not be run:
>>>    test-arm64-arm64-xl-xsm       8 xen-boot                 fail REGR. vs. 169773
>>
>> Well, I was overly optimistic :(. This now breaks in the ITS code:
>>
>> Apr 27 13:23:14.324831 (XEN) Xen call trace:
>> Apr 27 13:23:14.324855 (XEN)    [<000000000022a678>]
>> alloc_xenheap_pages+0x178/0x194 (PC)
>> Apr 27 13:23:14.336856 (XEN)    [<000000000022a670>]
>> alloc_xenheap_pages+0x170/0x194 (LR)
>> Apr 27 13:23:14.336886 (XEN)    [<0000000000237770>] _xmalloc+0x144/0x294
>> Apr 27 13:23:14.348773 (XEN)    [<00000000002378d4>] _xzalloc+0x14/0x30
>> Apr 27 13:23:14.348808 (XEN)    [<000000000027b4e4>]
>> gicv3_lpi_init_rdist+0x54/0x324
>> Apr 27 13:23:14.348835 (XEN)    [<0000000000279898>]
>> arch/arm/gic-v3.c#gicv3_cpu_init+0x128/0x46c
>> Apr 27 13:23:14.360799 (XEN)    [<0000000000279bfc>]
>> arch/arm/gic-v3.c#gicv3_secondary_cpu_init+0x20/0x50
>> Apr 27 13:23:14.372796 (XEN)    [<0000000000277054>]
>> gic_init_secondary_cpu+0x18/0x30
>> Apr 27 13:23:14.372829 (XEN)    [<0000000000284518>]
>> start_secondary+0x1a8/0x234
>> Apr 27 13:23:14.372856 (XEN)    [<0000010722aa4200>] 0000010722aa4200
>> Apr 27 13:23:14.384793 (XEN)
>> Apr 27 13:23:14.384823 (XEN)
>> Apr 27 13:23:14.384845 (XEN) ****************************************
>> Apr 27 13:23:14.384869 (XEN) Panic on CPU 2:
>> Apr 27 13:23:14.384891 (XEN) Assertion '!in_irq() &&
>> (local_irq_is_enabled() || num_online_cpus() <= 1)' failed at
>> common/page_alloc.c:2212
>> Apr 27 13:23:14.396805 (XEN) ****************************************
>>
>> The GICv3 LPI code contains a few calls to xmalloc() that will be done
>> while initializing the GIC CPU interface. I don't think we can delay the
>> initialization of the LPI part past local_irq_enable(). So I think we
>> will need to allocate the memory when preparing the CPU.
> 
> Do you have an explanation why the next flight (169800) passed?

The flight 169800 ran on laxtonX (Softiron) which doesn't have a GICv3 ITS.

I thought OSSTest would try to run the next flight on the same HW to 
check heisenbug, but maybe this doesn't happen for the smoke test?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
  2022-04-28  0:47       ` Stefano Stabellini
@ 2022-04-28 11:19         ` Julien Grall
  2022-04-29  0:41           ` Stefano Stabellini
  0 siblings, 1 reply; 12+ messages in thread
From: Julien Grall @ 2022-04-28 11:19 UTC (permalink / raw)
  To: Stefano Stabellini, Julien Grall
  Cc: osstest service owner, xen-devel, Jan Beulich, David Vrabel,
	Bertrand Marquis

Hi Stefano,

On 28/04/2022 01:47, Stefano Stabellini wrote:
> On Thu, 28 Apr 2022, Julien Grall wrote:
>> Hi Stefano,
>>
>> On Thu, 28 Apr 2022, 00:02 Stefano Stabellini, <sstabellini@kernel.org> wrote
>>        It seems to me that it is acceptable to allocate memory with interrupt
>>        disabled during __init. I cannot see any drawbacks with it. I think we
>>        should change the ASSERT to only trigger after __init: system_state ==
>>        SYS_STATE_active.
>>
>>        What do you think?
>>
>>
>> This would solve the immediate problem but not the long term one (i.e cpu hotplug).
>>
>> So I think it would be better to properly fix it right away.
> 
> Yeah, you are right about cpu hotplug. I think both statements are true:
> 
> - it is true that this is supposed to work with cpu hotplug and these
>    functions might be directly affected by cpu hotplug (by a CPU coming
>    online later on)
> 
> - it is also true that it might not make sense to ASSERT at __init time
>    if IRQs are disabled. There might be other places, not affected by cpu
>    hotplug, where we do memory allocation at __init time with IRQ
>    disabled. It might still be a good idea to add the system_state ==
>    SYS_STATE_active check in the ASSERT, not to solve this specific
>    problem but to avoid other issues.

AFAIU, it is not safe on x86 to do TLB flush with interrupts disabled 
*and* multiple CPUs running. So we can't generically relax the check.

Looking at the OSSTest results, both Arm32 and Arm64 without GICv3 ITS 
tests have passed. So it seems unnecessary to me to preemptively relax 
the check just for Arm.

> 
> 
> In regard to gicv3_lpi_allocate_pendtable, I haven't thought about the
> implications of cpu hotplug for LPIs and GICv3 before. Do you envision
> that in a CPU hotplug scenario gicv3_lpi_init_rdist would be called when
> the extra CPU comes online?

It is already called per-CPU. See gicv3_secondary_cpu_init() -> 
gicv3_cpu_init() -> gicv3_populate_rdist().

> 
> Today gicv3_lpi_init_rdist is called based on the number of
> rdist_regions without checking if the CPU is online or offline (I think ?)

The re-distributors are not banked and therefore accessible by everyone. 
However, in Xen case, each pCPU will only touch its own re-distributor 
(well aside TYPER to figure out the ID).

The loop in gicv3_populate_rdist() will walk throught all the
re-distributor to find which one corresponds to the current pCPU. Once 
we found it, we will call gicv3_lpi_init_rdist() to fully initialize the 
re-distributor.

I don't think we want to populate the memory for each re-distributor in 
advance.

Cheers,


-- 
Julien Grall


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
  2022-04-28 11:19         ` Julien Grall
@ 2022-04-29  0:41           ` Stefano Stabellini
  2022-04-29  9:04             ` Julien Grall
  0 siblings, 1 reply; 12+ messages in thread
From: Stefano Stabellini @ 2022-04-29  0:41 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Julien Grall, osstest service owner,
	xen-devel, Jan Beulich, David Vrabel, Bertrand Marquis

On Thu, 28 Apr 2022, Julien Grall wrote:
> On 28/04/2022 01:47, Stefano Stabellini wrote:
> > On Thu, 28 Apr 2022, Julien Grall wrote:
> > > Hi Stefano,
> > > 
> > > On Thu, 28 Apr 2022, 00:02 Stefano Stabellini, <sstabellini@kernel.org>
> > > wrote
> > >        It seems to me that it is acceptable to allocate memory with
> > > interrupt
> > >        disabled during __init. I cannot see any drawbacks with it. I think
> > > we
> > >        should change the ASSERT to only trigger after __init: system_state
> > > ==
> > >        SYS_STATE_active.
> > > 
> > >        What do you think?
> > > 
> > > 
> > > This would solve the immediate problem but not the long term one (i.e cpu
> > > hotplug).
> > > 
> > > So I think it would be better to properly fix it right away.
> > 
> > Yeah, you are right about cpu hotplug. I think both statements are true:
> > 
> > - it is true that this is supposed to work with cpu hotplug and these
> >    functions might be directly affected by cpu hotplug (by a CPU coming
> >    online later on)
> > 
> > - it is also true that it might not make sense to ASSERT at __init time
> >    if IRQs are disabled. There might be other places, not affected by cpu
> >    hotplug, where we do memory allocation at __init time with IRQ
> >    disabled. It might still be a good idea to add the system_state ==
> >    SYS_STATE_active check in the ASSERT, not to solve this specific
> >    problem but to avoid other issues.
> 
> AFAIU, it is not safe on x86 to do TLB flush with interrupts disabled *and*
> multiple CPUs running. So we can't generically relax the check.
> 
> Looking at the OSSTest results, both Arm32 and Arm64 without GICv3 ITS tests
> have passed. So it seems unnecessary to me to preemptively relax the check
> just for Arm.

It is good news that it works already (GICv3 aside) on ARM. If you
prefer not to relax it, I am OK with it (although it makes me a bit
worried about future breakages).

 
> > In regard to gicv3_lpi_allocate_pendtable, I haven't thought about the
> > implications of cpu hotplug for LPIs and GICv3 before. Do you envision
> > that in a CPU hotplug scenario gicv3_lpi_init_rdist would be called when
> > the extra CPU comes online?
> 
> It is already called per-CPU. See gicv3_secondary_cpu_init() ->
> gicv3_cpu_init() -> gicv3_populate_rdist().

Got it, thanks!


> > Today gicv3_lpi_init_rdist is called based on the number of
> > rdist_regions without checking if the CPU is online or offline (I think ?)
> 
> The re-distributors are not banked and therefore accessible by everyone.
> However, in Xen case, each pCPU will only touch its own re-distributor (well
> aside TYPER to figure out the ID).
> 
> The loop in gicv3_populate_rdist() will walk throught all the
> re-distributor to find which one corresponds to the current pCPU. Once we
> found it, we will call gicv3_lpi_init_rdist() to fully initialize the
> re-distributor.
> 
> I don't think we want to populate the memory for each re-distributor in
> advance.

I agree.

Currently we do:

    start_secondary
        [...]
        gic_init_secondary_cpu()
            [...]
            gicv3_lpi_init_rdist()
        [...]
        local_irq_enable();

Which seems to be the right sequence to me. There must be an early boot
phase where interrupts are disabled on a CPU but memory allocations are
possible. If this was x86 with the tlbflush limitation, I would suggest
to have per-cpu memory mapping areas so that we don't have to do any
global tlb flushes with interrupts disabled.

On ARM, we don't have the tlbflush limitation so we could do that but we
wouldn't have much to gain from it.

Also, this seems to be a bit of a special case, because in general we
can move drivers initializations later after local_irq_enable(). But
this is the interrupt controller driver itself -- we cannot move it
after local_irq_enable().

So maybe an ad-hoc solution could be acceptable?

The only one I can think of is to check on system_state ==
SYS_STATE_active now. In the future for CPU hotplug we could have a
per-CPU system_state, like cpu_system_state, and do a similar check.

I am totally open to other ideas, I couldn't come up with anything
better at the moment.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
  2022-04-29  0:41           ` Stefano Stabellini
@ 2022-04-29  9:04             ` Julien Grall
  2022-04-29 16:15               ` Stefano Stabellini
  0 siblings, 1 reply; 12+ messages in thread
From: Julien Grall @ 2022-04-29  9:04 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, osstest service owner, xen-devel, Jan Beulich,
	David Vrabel, Bertrand Marquis

Hi Stefano,

On 29/04/2022 01:41, Stefano Stabellini wrote:
> On Thu, 28 Apr 2022, Julien Grall wrote:
>> On 28/04/2022 01:47, Stefano Stabellini wrote:
>>> On Thu, 28 Apr 2022, Julien Grall wrote:
>>>> Hi Stefano,
>>>>
>>>> On Thu, 28 Apr 2022, 00:02 Stefano Stabellini, <sstabellini@kernel.org>
>>>> wrote
>>>>         It seems to me that it is acceptable to allocate memory with
>>>> interrupt
>>>>         disabled during __init. I cannot see any drawbacks with it. I think
>>>> we
>>>>         should change the ASSERT to only trigger after __init: system_state
>>>> ==
>>>>         SYS_STATE_active.
>>>>
>>>>         What do you think?
>>>>
>>>>
>>>> This would solve the immediate problem but not the long term one (i.e cpu
>>>> hotplug).
>>>>
>>>> So I think it would be better to properly fix it right away.
>>>
>>> Yeah, you are right about cpu hotplug. I think both statements are true:
>>>
>>> - it is true that this is supposed to work with cpu hotplug and these
>>>     functions might be directly affected by cpu hotplug (by a CPU coming
>>>     online later on)
>>>
>>> - it is also true that it might not make sense to ASSERT at __init time
>>>     if IRQs are disabled. There might be other places, not affected by cpu
>>>     hotplug, where we do memory allocation at __init time with IRQ
>>>     disabled. It might still be a good idea to add the system_state ==
>>>     SYS_STATE_active check in the ASSERT, not to solve this specific
>>>     problem but to avoid other issues.
>>
>> AFAIU, it is not safe on x86 to do TLB flush with interrupts disabled *and*
>> multiple CPUs running. So we can't generically relax the check.
>>
>> Looking at the OSSTest results, both Arm32 and Arm64 without GICv3 ITS tests
>> have passed. So it seems unnecessary to me to preemptively relax the check
>> just for Arm.
> 
> It is good news that it works already (GICv3 aside) on ARM. If you
> prefer not to relax it, I am OK with it (although it makes me a bit
> worried about future breakages).

Bear in mind this is a debug only breakage, production build will work 
fines with any ASSERT() affecting large code base, it is going to be 
difficult to find all the potential misuse. So we have to rely on wider 
testing and fix it as it gets reported.

If we relax the check, then we are never going to be able to harden the 
code in timely maneer.

>>> In regard to gicv3_lpi_allocate_pendtable, I haven't thought about the
>>> implications of cpu hotplug for LPIs and GICv3 before. Do you envision
>>> that in a CPU hotplug scenario gicv3_lpi_init_rdist would be called when
>>> the extra CPU comes online?
>>
>> It is already called per-CPU. See gicv3_secondary_cpu_init() ->
>> gicv3_cpu_init() -> gicv3_populate_rdist().
> 
> Got it, thanks!
> 
> 
>>> Today gicv3_lpi_init_rdist is called based on the number of
>>> rdist_regions without checking if the CPU is online or offline (I think ?)
>>
>> The re-distributors are not banked and therefore accessible by everyone.
>> However, in Xen case, each pCPU will only touch its own re-distributor (well
>> aside TYPER to figure out the ID).
>>
>> The loop in gicv3_populate_rdist() will walk throught all the
>> re-distributor to find which one corresponds to the current pCPU. Once we
>> found it, we will call gicv3_lpi_init_rdist() to fully initialize the
>> re-distributor.
>>
>> I don't think we want to populate the memory for each re-distributor in
>> advance.
> 
> I agree.
> 
> Currently we do:
> 
>      start_secondary
>          [...]
>          gic_init_secondary_cpu()
>              [...]
>              gicv3_lpi_init_rdist()
>          [...]
>          local_irq_enable();
> 
> Which seems to be the right sequence to me. There must be an early boot
> phase where interrupts are disabled on a CPU but memory allocations are
> possible. If this was x86 with the tlbflush limitation, I would suggest
> to have per-cpu memory mapping areas so that we don't have to do any
> global tlb flushes with interrupts disabled.
> 
> On ARM, we don't have the tlbflush limitation so we could do that but we
> wouldn't have much to gain from it.
> 
> Also, this seems to be a bit of a special case, because in general we
> can move drivers initializations later after local_irq_enable(). But
> this is the interrupt controller driver itself -- we cannot move it
> after local_irq_enable().
> 
> So maybe an ad-hoc solution could be acceptable?

We don't need any ad-hoc solution here. We can register a CPU notifier 
that will notify us when a CPU will be prepared. Something like below 
should work (untested yet):

diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
index e1594dd20e4c..ccf4868540f5 100644
--- a/xen/arch/arm/gic-v3-lpi.c
+++ b/xen/arch/arm/gic-v3-lpi.c
@@ -18,6 +18,7 @@
   * along with this program; If not, see <http://www.gnu.org/licenses/>.
   */

+#include <xen/cpu.h>
  #include <xen/lib.h>
  #include <xen/mm.h>
  #include <xen/param.h>
@@ -234,18 +235,13 @@ void gicv3_lpi_update_host_entry(uint32_t 
host_lpi, int domain_id,
      write_u64_atomic(&hlpip->data, hlpi.data);
  }

-static int gicv3_lpi_allocate_pendtable(uint64_t *reg)
+static int gicv3_lpi_allocate_pendtable(unsigned int cpu)
  {
-    uint64_t val;
      void *pendtable;

-    if ( this_cpu(lpi_redist).pending_table )
+    if ( per_cpu(lpi_redist, cpu).pending_table )
          return -EBUSY;

-    val  = GIC_BASER_CACHE_RaWaWb << 
GICR_PENDBASER_INNER_CACHEABILITY_SHIFT;
-    val |= GIC_BASER_CACHE_SameAsInner << 
GICR_PENDBASER_OUTER_CACHEABILITY_SHIFT;
-    val |= GIC_BASER_InnerShareable << GICR_PENDBASER_SHAREABILITY_SHIFT;
-
      /*
       * The pending table holds one bit per LPI and even covers bits for
       * interrupt IDs below 8192, so we allocate the full range.
@@ -265,13 +261,38 @@ static int gicv3_lpi_allocate_pendtable(uint64_t *reg)
      clean_and_invalidate_dcache_va_range(pendtable,
                                           lpi_data.max_host_lpi_ids / 8);

-    this_cpu(lpi_redist).pending_table = pendtable;
+    per_cpu(lpi_redist, cpu).pending_table = pendtable;

-    val |= GICR_PENDBASER_PTZ;
+    return 0;
+}
+
+static int gicv3_lpi_set_pendtable(void __iomem *rdist_base)
+{
+    const void *pendtable = this_cpu(lpi_redist).pending_table;
+    uint64_t val;
+
+    if ( !pendtable )
+        return -ENOMEM;

+    ASSERT(virt_to_maddr(pendtable) & ~GENMASK(51, 16));
+
+    val  = GIC_BASER_CACHE_RaWaWb << 
GICR_PENDBASER_INNER_CACHEABILITY_SHIFT;
+    val |= GIC_BASER_CACHE_SameAsInner << 
GICR_PENDBASER_OUTER_CACHEABILITY_SHIFT;
+    val |= GIC_BASER_InnerShareable << GICR_PENDBASER_SHAREABILITY_SHIFT;
+    val |= GICR_PENDBASER_PTZ;
      val |= virt_to_maddr(pendtable);

-    *reg = val;
+    writeq_relaxed(val, rdist_base + GICR_PENDBASER);
+    val = readq_relaxed(rdist_base + GICR_PENDBASER);
+
+    /* If the hardware reports non-shareable, drop cacheability as well. */
+    if ( !(val & GICR_PENDBASER_SHAREABILITY_MASK) )
+    {
+        val &= ~GICR_PENDBASER_INNER_CACHEABILITY_MASK;
+        val |= GIC_BASER_CACHE_nC << 
GICR_PENDBASER_INNER_CACHEABILITY_SHIFT;
+
+        writeq_relaxed(val, rdist_base + GICR_PENDBASER);
+    }

      return 0;
  }
@@ -340,7 +361,6 @@ static int gicv3_lpi_set_proptable(void __iomem * 
rdist_base)
  int gicv3_lpi_init_rdist(void __iomem * rdist_base)
  {
      uint32_t reg;
-    uint64_t table_reg;
      int ret;

      /* We don't support LPIs without an ITS. */
@@ -352,24 +372,33 @@ int gicv3_lpi_init_rdist(void __iomem * rdist_base)
      if ( reg & GICR_CTLR_ENABLE_LPIS )
          return -EBUSY;

-    ret = gicv3_lpi_allocate_pendtable(&table_reg);
+    ret = gicv3_lpi_set_pendtable(rdist_base);
      if ( ret )
          return ret;
-    writeq_relaxed(table_reg, rdist_base + GICR_PENDBASER);
-    table_reg = readq_relaxed(rdist_base + GICR_PENDBASER);

-    /* If the hardware reports non-shareable, drop cacheability as well. */
-    if ( !(table_reg & GICR_PENDBASER_SHAREABILITY_MASK) )
-    {
-        table_reg &= ~GICR_PENDBASER_INNER_CACHEABILITY_MASK;
-        table_reg |= GIC_BASER_CACHE_nC << 
GICR_PENDBASER_INNER_CACHEABILITY_SHIFT;
+    return gicv3_lpi_set_proptable(rdist_base);
+}
+
+static int cpu_callback(struct notifier_block *nfb, unsigned long action,
+                        void *hcpu)
+{
+    unsigned long cpu = (unsigned long)hcpu;
+    int rc = 0;

-        writeq_relaxed(table_reg, rdist_base + GICR_PENDBASER);
+    switch ( action )
+    {
+    case CPU_UP_PREPARE:
+        rc = gicv3_lpi_allocate_pendtable(cpu);
+        break;
      }

-    return gicv3_lpi_set_proptable(rdist_base);
+    return !rc ? NOTIFY_DONE : notifier_from_errno(rc);
  }

+static struct notifier_block cpu_nfb = {
+    .notifier_call = cpu_callback,
+};
+
  static unsigned int max_lpi_bits = 20;
  integer_param("max_lpi_bits", max_lpi_bits);

@@ -381,6 +410,7 @@ integer_param("max_lpi_bits", max_lpi_bits);
  int gicv3_lpi_init_host_lpis(unsigned int host_lpi_bits)
  {
      unsigned int nr_lpi_ptrs;
+    int rc;

      /* We rely on the data structure being atomically accessible. */
      BUILD_BUG_ON(sizeof(union host_lpi) > sizeof(unsigned long));
@@ -413,7 +443,14 @@ int gicv3_lpi_init_host_lpis(unsigned int 
host_lpi_bits)

      printk("GICv3: using at most %lu LPIs on the host.\n", 
MAX_NR_HOST_LPIS);

-    return 0;
+    /* Register the CPU notifier and allocate memory for the boot CPU */
+    register_cpu_notifier(&cpu_nfb);
+    rc = gicv3_lpi_allocate_pendtable(smp_processor_id());
+    if ( rc )
+        printk(XENLOG_ERR "Unable to allocate the pendtable for CPU%u\n",
+               smp_processor_id());
+
+    return rc;
  }

  static int find_unused_host_lpi(uint32_t start, uint32_t *index)

Cheers,

-- 
Julien Grall


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
  2022-04-29  9:04             ` Julien Grall
@ 2022-04-29 16:15               ` Stefano Stabellini
  2022-04-29 16:47                 ` Julien Grall
  0 siblings, 1 reply; 12+ messages in thread
From: Stefano Stabellini @ 2022-04-29 16:15 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Julien Grall, osstest service owner,
	xen-devel, Jan Beulich, David Vrabel, Bertrand Marquis

On Fri, 29 Apr 2022, Julien Grall wrote:
> On 29/04/2022 01:41, Stefano Stabellini wrote:
> > On Thu, 28 Apr 2022, Julien Grall wrote:
> > > On 28/04/2022 01:47, Stefano Stabellini wrote:
> > > > On Thu, 28 Apr 2022, Julien Grall wrote:
> > > > > Hi Stefano,
> > > > > 
> > > > > On Thu, 28 Apr 2022, 00:02 Stefano Stabellini,
> > > > > <sstabellini@kernel.org>
> > > > > wrote
> > > > >         It seems to me that it is acceptable to allocate memory with
> > > > > interrupt
> > > > >         disabled during __init. I cannot see any drawbacks with it. I
> > > > > think
> > > > > we
> > > > >         should change the ASSERT to only trigger after __init:
> > > > > system_state
> > > > > ==
> > > > >         SYS_STATE_active.
> > > > > 
> > > > >         What do you think?
> > > > > 
> > > > > 
> > > > > This would solve the immediate problem but not the long term one (i.e
> > > > > cpu
> > > > > hotplug).
> > > > > 
> > > > > So I think it would be better to properly fix it right away.
> > > > 
> > > > Yeah, you are right about cpu hotplug. I think both statements are true:
> > > > 
> > > > - it is true that this is supposed to work with cpu hotplug and these
> > > >     functions might be directly affected by cpu hotplug (by a CPU coming
> > > >     online later on)
> > > > 
> > > > - it is also true that it might not make sense to ASSERT at __init time
> > > >     if IRQs are disabled. There might be other places, not affected by
> > > > cpu
> > > >     hotplug, where we do memory allocation at __init time with IRQ
> > > >     disabled. It might still be a good idea to add the system_state ==
> > > >     SYS_STATE_active check in the ASSERT, not to solve this specific
> > > >     problem but to avoid other issues.
> > > 
> > > AFAIU, it is not safe on x86 to do TLB flush with interrupts disabled
> > > *and*
> > > multiple CPUs running. So we can't generically relax the check.
> > > 
> > > Looking at the OSSTest results, both Arm32 and Arm64 without GICv3 ITS
> > > tests
> > > have passed. So it seems unnecessary to me to preemptively relax the check
> > > just for Arm.
> > 
> > It is good news that it works already (GICv3 aside) on ARM. If you
> > prefer not to relax it, I am OK with it (although it makes me a bit
> > worried about future breakages).
> 
> Bear in mind this is a debug only breakage, production build will work fines
> with any ASSERT() affecting large code base, it is going to be difficult to
> find all the potential misuse. So we have to rely on wider testing and fix it
> as it gets reported.
> 
> If we relax the check, then we are never going to be able to harden the code
> in timely maneer.
> 
> > > > In regard to gicv3_lpi_allocate_pendtable, I haven't thought about the
> > > > implications of cpu hotplug for LPIs and GICv3 before. Do you envision
> > > > that in a CPU hotplug scenario gicv3_lpi_init_rdist would be called when
> > > > the extra CPU comes online?
> > > 
> > > It is already called per-CPU. See gicv3_secondary_cpu_init() ->
> > > gicv3_cpu_init() -> gicv3_populate_rdist().
> > 
> > Got it, thanks!
> > 
> > 
> > > > Today gicv3_lpi_init_rdist is called based on the number of
> > > > rdist_regions without checking if the CPU is online or offline (I think
> > > > ?)
> > > 
> > > The re-distributors are not banked and therefore accessible by everyone.
> > > However, in Xen case, each pCPU will only touch its own re-distributor
> > > (well
> > > aside TYPER to figure out the ID).
> > > 
> > > The loop in gicv3_populate_rdist() will walk throught all the
> > > re-distributor to find which one corresponds to the current pCPU. Once we
> > > found it, we will call gicv3_lpi_init_rdist() to fully initialize the
> > > re-distributor.
> > > 
> > > I don't think we want to populate the memory for each re-distributor in
> > > advance.
> > 
> > I agree.
> > 
> > Currently we do:
> > 
> >      start_secondary
> >          [...]
> >          gic_init_secondary_cpu()
> >              [...]
> >              gicv3_lpi_init_rdist()
> >          [...]
> >          local_irq_enable();
> > 
> > Which seems to be the right sequence to me. There must be an early boot
> > phase where interrupts are disabled on a CPU but memory allocations are
> > possible. If this was x86 with the tlbflush limitation, I would suggest
> > to have per-cpu memory mapping areas so that we don't have to do any
> > global tlb flushes with interrupts disabled.
> > 
> > On ARM, we don't have the tlbflush limitation so we could do that but we
> > wouldn't have much to gain from it.
> > 
> > Also, this seems to be a bit of a special case, because in general we
> > can move drivers initializations later after local_irq_enable(). But
> > this is the interrupt controller driver itself -- we cannot move it
> > after local_irq_enable().
> > 
> > So maybe an ad-hoc solution could be acceptable?
> 
> We don't need any ad-hoc solution here. We can register a CPU notifier that
> will notify us when a CPU will be prepared. Something like below should work
> (untested yet):

The CPU notifier is a good idea. It looks like the patch below got
corrupted somehow by the mailer. If you send it as a proper patch I am
happy to have a look.


> diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
> index e1594dd20e4c..ccf4868540f5 100644
> --- a/xen/arch/arm/gic-v3-lpi.c
> +++ b/xen/arch/arm/gic-v3-lpi.c
> @@ -18,6 +18,7 @@
>   * along with this program; If not, see <http://www.gnu.org/licenses/>.
>   */
> 
> +#include <xen/cpu.h>
>  #include <xen/lib.h>
>  #include <xen/mm.h>
>  #include <xen/param.h>
> @@ -234,18 +235,13 @@ void gicv3_lpi_update_host_entry(uint32_t host_lpi, int
> domain_id,
>      write_u64_atomic(&hlpip->data, hlpi.data);
>  }
> 
> -static int gicv3_lpi_allocate_pendtable(uint64_t *reg)
> +static int gicv3_lpi_allocate_pendtable(unsigned int cpu)
>  {
> -    uint64_t val;
>      void *pendtable;
> 
> -    if ( this_cpu(lpi_redist).pending_table )
> +    if ( per_cpu(lpi_redist, cpu).pending_table )
>          return -EBUSY;
> 
> -    val  = GIC_BASER_CACHE_RaWaWb << GICR_PENDBASER_INNER_CACHEABILITY_SHIFT;
> -    val |= GIC_BASER_CACHE_SameAsInner <<
> GICR_PENDBASER_OUTER_CACHEABILITY_SHIFT;
> -    val |= GIC_BASER_InnerShareable << GICR_PENDBASER_SHAREABILITY_SHIFT;
> -
>      /*
>       * The pending table holds one bit per LPI and even covers bits for
>       * interrupt IDs below 8192, so we allocate the full range.
> @@ -265,13 +261,38 @@ static int gicv3_lpi_allocate_pendtable(uint64_t *reg)
>      clean_and_invalidate_dcache_va_range(pendtable,
>                                           lpi_data.max_host_lpi_ids / 8);
> 
> -    this_cpu(lpi_redist).pending_table = pendtable;
> +    per_cpu(lpi_redist, cpu).pending_table = pendtable;
> 
> -    val |= GICR_PENDBASER_PTZ;
> +    return 0;
> +}
> +
> +static int gicv3_lpi_set_pendtable(void __iomem *rdist_base)
> +{
> +    const void *pendtable = this_cpu(lpi_redist).pending_table;
> +    uint64_t val;
> +
> +    if ( !pendtable )
> +        return -ENOMEM;
> 
> +    ASSERT(virt_to_maddr(pendtable) & ~GENMASK(51, 16));
> +
> +    val  = GIC_BASER_CACHE_RaWaWb << GICR_PENDBASER_INNER_CACHEABILITY_SHIFT;
> +    val |= GIC_BASER_CACHE_SameAsInner <<
> GICR_PENDBASER_OUTER_CACHEABILITY_SHIFT;
> +    val |= GIC_BASER_InnerShareable << GICR_PENDBASER_SHAREABILITY_SHIFT;
> +    val |= GICR_PENDBASER_PTZ;
>      val |= virt_to_maddr(pendtable);
> 
> -    *reg = val;
> +    writeq_relaxed(val, rdist_base + GICR_PENDBASER);
> +    val = readq_relaxed(rdist_base + GICR_PENDBASER);
> +
> +    /* If the hardware reports non-shareable, drop cacheability as well. */
> +    if ( !(val & GICR_PENDBASER_SHAREABILITY_MASK) )
> +    {
> +        val &= ~GICR_PENDBASER_INNER_CACHEABILITY_MASK;
> +        val |= GIC_BASER_CACHE_nC << GICR_PENDBASER_INNER_CACHEABILITY_SHIFT;
> +
> +        writeq_relaxed(val, rdist_base + GICR_PENDBASER);
> +    }
> 
>      return 0;
>  }
> @@ -340,7 +361,6 @@ static int gicv3_lpi_set_proptable(void __iomem *
> rdist_base)
>  int gicv3_lpi_init_rdist(void __iomem * rdist_base)
>  {
>      uint32_t reg;
> -    uint64_t table_reg;
>      int ret;
> 
>      /* We don't support LPIs without an ITS. */
> @@ -352,24 +372,33 @@ int gicv3_lpi_init_rdist(void __iomem * rdist_base)
>      if ( reg & GICR_CTLR_ENABLE_LPIS )
>          return -EBUSY;
> 
> -    ret = gicv3_lpi_allocate_pendtable(&table_reg);
> +    ret = gicv3_lpi_set_pendtable(rdist_base);
>      if ( ret )
>          return ret;
> -    writeq_relaxed(table_reg, rdist_base + GICR_PENDBASER);
> -    table_reg = readq_relaxed(rdist_base + GICR_PENDBASER);
> 
> -    /* If the hardware reports non-shareable, drop cacheability as well. */
> -    if ( !(table_reg & GICR_PENDBASER_SHAREABILITY_MASK) )
> -    {
> -        table_reg &= ~GICR_PENDBASER_INNER_CACHEABILITY_MASK;
> -        table_reg |= GIC_BASER_CACHE_nC <<
> GICR_PENDBASER_INNER_CACHEABILITY_SHIFT;
> +    return gicv3_lpi_set_proptable(rdist_base);
> +}
> +
> +static int cpu_callback(struct notifier_block *nfb, unsigned long action,
> +                        void *hcpu)
> +{
> +    unsigned long cpu = (unsigned long)hcpu;
> +    int rc = 0;
> 
> -        writeq_relaxed(table_reg, rdist_base + GICR_PENDBASER);
> +    switch ( action )
> +    {
> +    case CPU_UP_PREPARE:
> +        rc = gicv3_lpi_allocate_pendtable(cpu);
> +        break;
>      }
> 
> -    return gicv3_lpi_set_proptable(rdist_base);
> +    return !rc ? NOTIFY_DONE : notifier_from_errno(rc);
>  }
> 
> +static struct notifier_block cpu_nfb = {
> +    .notifier_call = cpu_callback,
> +};
> +
>  static unsigned int max_lpi_bits = 20;
>  integer_param("max_lpi_bits", max_lpi_bits);
> 
> @@ -381,6 +410,7 @@ integer_param("max_lpi_bits", max_lpi_bits);
>  int gicv3_lpi_init_host_lpis(unsigned int host_lpi_bits)
>  {
>      unsigned int nr_lpi_ptrs;
> +    int rc;
> 
>      /* We rely on the data structure being atomically accessible. */
>      BUILD_BUG_ON(sizeof(union host_lpi) > sizeof(unsigned long));
> @@ -413,7 +443,14 @@ int gicv3_lpi_init_host_lpis(unsigned int host_lpi_bits)
> 
>      printk("GICv3: using at most %lu LPIs on the host.\n", MAX_NR_HOST_LPIS);
> 
> -    return 0;
> +    /* Register the CPU notifier and allocate memory for the boot CPU */
> +    register_cpu_notifier(&cpu_nfb);
> +    rc = gicv3_lpi_allocate_pendtable(smp_processor_id());
> +    if ( rc )
> +        printk(XENLOG_ERR "Unable to allocate the pendtable for CPU%u\n",
> +               smp_processor_id());
> +
> +    return rc;
>  }
> 
>  static int find_unused_host_lpi(uint32_t start, uint32_t *index)
> 
> Cheers,
> 
> -- 
> Julien Grall
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [xen-unstable-smoke test] 169781: regressions - FAIL
  2022-04-29 16:15               ` Stefano Stabellini
@ 2022-04-29 16:47                 ` Julien Grall
  0 siblings, 0 replies; 12+ messages in thread
From: Julien Grall @ 2022-04-29 16:47 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, osstest service owner, xen-devel, Jan Beulich,
	David Vrabel, Bertrand Marquis



On 29/04/2022 17:15, Stefano Stabellini wrote:
>>> Which seems to be the right sequence to me. There must be an early boot
>>> phase where interrupts are disabled on a CPU but memory allocations are
>>> possible. If this was x86 with the tlbflush limitation, I would suggest
>>> to have per-cpu memory mapping areas so that we don't have to do any
>>> global tlb flushes with interrupts disabled.
>>>
>>> On ARM, we don't have the tlbflush limitation so we could do that but we
>>> wouldn't have much to gain from it.
>>>
>>> Also, this seems to be a bit of a special case, because in general we
>>> can move drivers initializations later after local_irq_enable(). But
>>> this is the interrupt controller driver itself -- we cannot move it
>>> after local_irq_enable().
>>>
>>> So maybe an ad-hoc solution could be acceptable?
>>
>> We don't need any ad-hoc solution here. We can register a CPU notifier that
>> will notify us when a CPU will be prepared. Something like below should work
>> (untested yet):
> 
> The CPU notifier is a good idea. It looks like the patch below got
> corrupted somehow by the mailer. If you send it as a proper patch I am
> happy to have a look.

Doh. I will send a proper patch once I have done some testing.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-04-29 16:47 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-27 16:38 [xen-unstable-smoke test] 169781: regressions - FAIL osstest service owner
2022-04-27 17:10 ` Julien Grall
2022-04-27 23:02   ` Stefano Stabellini
2022-04-27 23:13     ` Julien Grall
2022-04-28  0:47       ` Stefano Stabellini
2022-04-28 11:19         ` Julien Grall
2022-04-29  0:41           ` Stefano Stabellini
2022-04-29  9:04             ` Julien Grall
2022-04-29 16:15               ` Stefano Stabellini
2022-04-29 16:47                 ` Julien Grall
2022-04-28  7:45   ` Jan Beulich
2022-04-28  8:45     ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.