All of lore.kernel.org
 help / color / mirror / Atom feed
* Discussion of Xenheap problems on AArch64
@ 2021-04-21  6:28 Henry Wang
  2021-04-21  9:03 ` Julien Grall
  0 siblings, 1 reply; 15+ messages in thread
From: Henry Wang @ 2021-04-21  6:28 UTC (permalink / raw)
  To: julien, sstabellini, xen-devel; +Cc: Wei Chen, Penny Zheng, Bertrand Marquis

Hi,

We are trying to implement the static memory allocation on AArch64. Part of
this feature is the reserved heap memory allocation, where a specific range of
memory is reserved only for heap. In the development process, we found a
pitfall in current AArch64 setup_xenheap_mappings() function.

According to a previous discussion in community
https://lore.kernel.org/xen-devel/20190216134456.10681-1-peng.fan@nxp.com/,
on AArch64, bootmem is initialized after setup_xenheap_mappings(),
setup_xenheap_mappings() may try to allocate memory before memory has been
handed over to the boot allocator. If the reserved heap memory allocation is
introduced, either of below 2 cases will trigger a crash:

1. If the reserved heap memory is at the end of the memory block list and the
gap between reserved and unreserved memory is bigger than 512GB, when we setup
mappings from the beginning of the memory block list, we will get OOM caused
by lack of pages in boot allocator. This is because the memory that is reserved
for heap has not been mapped and added to the boot allocator.

2. If we add the memory that is reserved for heap to boot allocator first, and
then setup mappings for banks in the memory block list, we may get a page which
has not been setup mapping, causing a data abort.

Also, according to Julien's reply in previous mailing list discussion, we are
meant to support up to 5TB of RAM (see
https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg00881.html).
Therefore, we think that maybe it is the time to re-visit this problem and try
to find a proper way to address it. Any comments?

Kind regards,

Henry


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Discussion of Xenheap problems on AArch64
  2021-04-21  6:28 Discussion of Xenheap problems on AArch64 Henry Wang
@ 2021-04-21  9:03 ` Julien Grall
  2021-04-21  9:32   ` Henry Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Julien Grall @ 2021-04-21  9:03 UTC (permalink / raw)
  To: Henry Wang, sstabellini, xen-devel
  Cc: Wei Chen, Penny Zheng, Bertrand Marquis



On 21/04/2021 07:28, Henry Wang wrote:
> Hi,

Hi Henry,

> 
> We are trying to implement the static memory allocation on AArch64. Part of
> this feature is the reserved heap memory allocation, where a specific range of
> memory is reserved only for heap. In the development process, we found a
> pitfall in current AArch64 setup_xenheap_mappings() function.
> 
> According to a previous discussion in community
> https://lore.kernel.org/xen-devel/20190216134456.10681-1-peng.fan@nxp.com/,
> on AArch64, bootmem is initialized after setup_xenheap_mappings(),
> setup_xenheap_mappings() may try to allocate memory before memory has been
> handed over to the boot allocator. If the reserved heap memory allocation is
> introduced, either of below 2 cases will trigger a crash:
> 
> 1. If the reserved heap memory is at the end of the memory block list and the
> gap between reserved and unreserved memory is bigger than 512GB, when we setup
> mappings from the beginning of the memory block list, we will get OOM caused
> by lack of pages in boot allocator. This is because the memory that is reserved
> for heap has not been mapped and added to the boot allocator.
> 
> 2. If we add the memory that is reserved for heap to boot allocator first, and
> then setup mappings for banks in the memory block list, we may get a page which
> has not been setup mapping, causing a data abort.

There are a few issues with setup_xenheap_mappings(). I have been 
reworking the code on my spare time and started to upstream bits of it. 
A PoC can be found here:

https://xenbits.xen.org/gitweb/?p=people/julieng/xen-unstable.git;a=shortlog;h=refs/heads/pt/dev

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Discussion of Xenheap problems on AArch64
  2021-04-21  9:03 ` Julien Grall
@ 2021-04-21  9:32   ` Henry Wang
  2021-04-25 20:19     ` Julien Grall
  0 siblings, 1 reply; 15+ messages in thread
From: Henry Wang @ 2021-04-21  9:32 UTC (permalink / raw)
  To: Julien Grall, sstabellini, xen-devel
  Cc: Wei Chen, Penny Zheng, Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: Wednesday, April 21, 2021 5:04 PM
> To: Henry Wang <Henry.Wang@arm.com>; sstabellini@kernel.org; xen-
> devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Penny Zheng
> <Penny.Zheng@arm.com>; Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: Discussion of Xenheap problems on AArch64
> 
> 
> 
> On 21/04/2021 07:28, Henry Wang wrote:
> > Hi,
> 
> Hi Henry,
> 
> >
> > We are trying to implement the static memory allocation on AArch64. Part
> of
> > this feature is the reserved heap memory allocation, where a specific range
> of
> > memory is reserved only for heap. In the development process, we found a
> > pitfall in current AArch64 setup_xenheap_mappings() function.
> >
> > According to a previous discussion in community
> > https://lore.kernel.org/xen-devel/20190216134456.10681-1-
> peng.fan@nxp.com/,
> > on AArch64, bootmem is initialized after setup_xenheap_mappings(),
> > setup_xenheap_mappings() may try to allocate memory before memory
> has been
> > handed over to the boot allocator. If the reserved heap memory allocation
> is
> > introduced, either of below 2 cases will trigger a crash:
> >
> > 1. If the reserved heap memory is at the end of the memory block list and
> the
> > gap between reserved and unreserved memory is bigger than 512GB, when
> we setup
> > mappings from the beginning of the memory block list, we will get OOM
> caused
> > by lack of pages in boot allocator. This is because the memory that is
> reserved
> > for heap has not been mapped and added to the boot allocator.
> >
> > 2. If we add the memory that is reserved for heap to boot allocator first,
> and
> > then setup mappings for banks in the memory block list, we may get a page
> which
> > has not been setup mapping, causing a data abort.
> 
> There are a few issues with setup_xenheap_mappings(). I have been
> reworking the code on my spare time and started to upstream bits of it.
> A PoC can be found here:
> 
> https://xenbits.xen.org/gitweb/?p=people/julieng/xen-
> unstable.git;a=shortlog;h=refs/heads/pt/dev
> 

Really great news! Thanks you very much for the information and your hard
work on the PoC :) I will start to go through your PoC code then.

Kind regards,
Henry

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Discussion of Xenheap problems on AArch64
  2021-04-21  9:32   ` Henry Wang
@ 2021-04-25 20:19     ` Julien Grall
  2021-04-27  6:29       ` Henry Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Julien Grall @ 2021-04-25 20:19 UTC (permalink / raw)
  To: Henry Wang, sstabellini, xen-devel
  Cc: Wei Chen, Penny Zheng, Bertrand Marquis



On 21/04/2021 10:32, Henry Wang wrote:
> Hi Julien,

Hi Henry,

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: Wednesday, April 21, 2021 5:04 PM
>> To: Henry Wang <Henry.Wang@arm.com>; sstabellini@kernel.org; xen-
>> devel@lists.xenproject.org
>> Cc: Wei Chen <Wei.Chen@arm.com>; Penny Zheng
>> <Penny.Zheng@arm.com>; Bertrand Marquis <Bertrand.Marquis@arm.com>
>> Subject: Re: Discussion of Xenheap problems on AArch64
>>
>>
>>
>> On 21/04/2021 07:28, Henry Wang wrote:
>>> Hi,
>>
>> Hi Henry,
>>
>>>
>>> We are trying to implement the static memory allocation on AArch64. Part
>> of
>>> this feature is the reserved heap memory allocation, where a specific range
>> of
>>> memory is reserved only for heap. In the development process, we found a
>>> pitfall in current AArch64 setup_xenheap_mappings() function.
>>>
>>> According to a previous discussion in community
>>> https://lore.kernel.org/xen-devel/20190216134456.10681-1-
>> peng.fan@nxp.com/,
>>> on AArch64, bootmem is initialized after setup_xenheap_mappings(),
>>> setup_xenheap_mappings() may try to allocate memory before memory
>> has been
>>> handed over to the boot allocator. If the reserved heap memory allocation
>> is
>>> introduced, either of below 2 cases will trigger a crash:
>>>
>>> 1. If the reserved heap memory is at the end of the memory block list and
>> the
>>> gap between reserved and unreserved memory is bigger than 512GB, when
>> we setup
>>> mappings from the beginning of the memory block list, we will get OOM
>> caused
>>> by lack of pages in boot allocator. This is because the memory that is
>> reserved
>>> for heap has not been mapped and added to the boot allocator.
>>>
>>> 2. If we add the memory that is reserved for heap to boot allocator first,
>> and
>>> then setup mappings for banks in the memory block list, we may get a page
>> which
>>> has not been setup mapping, causing a data abort.
>>
>> There are a few issues with setup_xenheap_mappings(). I have been
>> reworking the code on my spare time and started to upstream bits of it.
>> A PoC can be found here:
>>
>> https://xenbits.xen.org/gitweb/?p=people/julieng/xen-
>> unstable.git;a=shortlog;h=refs/heads/pt/dev
>>
> 
> Really great news! Thanks you very much for the information and your hard
> work on the PoC :) I will start to go through your PoC code then.

I spent sometimes today to clean-up the PoC and sent a series on the ML 
(see [1]). This has been lightly tested so far.

Would you be able to give a try and let me know if it helps your problem?

For convenience, I have pushed a branch with the series applied here:

https://xenbits.xen.org/gitweb/?p=people/julieng/xen-unstable.git;a=shortlog;h=refs/heads/pt/rfc-v2

Cheers,

[1] https://lore.kernel.org/xen-devel/20210425201318.15447-1-julien@xen.org/

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Discussion of Xenheap problems on AArch64
  2021-04-25 20:19     ` Julien Grall
@ 2021-04-27  6:29       ` Henry Wang
  2021-04-28  9:28         ` Henry Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Henry Wang @ 2021-04-27  6:29 UTC (permalink / raw)
  To: Julien Grall, sstabellini, xen-devel
  Cc: Wei Chen, Penny Zheng, Bertrand Marquis

Hi Julien,

Sorry for the late reply, I kinda missed this email somehow....

Please see my inline reply ^^

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of
> Julien Grall
> Sent: Monday, April 26, 2021 4:20 AM
> To: Henry Wang <Henry.Wang@arm.com>; sstabellini@kernel.org; xen-
> devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Penny Zheng
> <Penny.Zheng@arm.com>; Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: Discussion of Xenheap problems on AArch64
> 
> 
> 
> On 21/04/2021 10:32, Henry Wang wrote:
> > Hi Julien,
> 
> Hi Henry,
> 
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: Wednesday, April 21, 2021 5:04 PM
> >> To: Henry Wang <Henry.Wang@arm.com>; sstabellini@kernel.org; xen-
> >> devel@lists.xenproject.org
> >> Cc: Wei Chen <Wei.Chen@arm.com>; Penny Zheng
> >> <Penny.Zheng@arm.com>; Bertrand Marquis
> <Bertrand.Marquis@arm.com>
> >> Subject: Re: Discussion of Xenheap problems on AArch64
> >>
> >>
> >>
> >> On 21/04/2021 07:28, Henry Wang wrote:
> >>> Hi,
> >>
> >> Hi Henry,
> >>
> >>>
> >>> We are trying to implement the static memory allocation on AArch64.
> Part
> >> of
> >>> this feature is the reserved heap memory allocation, where a specific
> range
> >> of
> >>> memory is reserved only for heap. In the development process, we
> found a
> >>> pitfall in current AArch64 setup_xenheap_mappings() function.
> >>>
> >>> According to a previous discussion in community
> >>> https://lore.kernel.org/xen-devel/20190216134456.10681-1-
> >> peng.fan@nxp.com/,
> >>> on AArch64, bootmem is initialized after setup_xenheap_mappings(),
> >>> setup_xenheap_mappings() may try to allocate memory before memory
> >> has been
> >>> handed over to the boot allocator. If the reserved heap memory
> allocation
> >> is
> >>> introduced, either of below 2 cases will trigger a crash:
> >>>
> >>> 1. If the reserved heap memory is at the end of the memory block list
> and
> >> the
> >>> gap between reserved and unreserved memory is bigger than 512GB,
> when
> >> we setup
> >>> mappings from the beginning of the memory block list, we will get OOM
> >> caused
> >>> by lack of pages in boot allocator. This is because the memory that is
> >> reserved
> >>> for heap has not been mapped and added to the boot allocator.
> >>>
> >>> 2. If we add the memory that is reserved for heap to boot allocator first,
> >> and
> >>> then setup mappings for banks in the memory block list, we may get a
> page
> >> which
> >>> has not been setup mapping, causing a data abort.
> >>
> >> There are a few issues with setup_xenheap_mappings(). I have been
> >> reworking the code on my spare time and started to upstream bits of it.
> >> A PoC can be found here:
> >>
> >> https://xenbits.xen.org/gitweb/?p=people/julieng/xen-
> >> unstable.git;a=shortlog;h=refs/heads/pt/dev
> >>
> >
> > Really great news! Thanks you very much for the information and your
> hard
> > work on the PoC :) I will start to go through your PoC code then.
> 
> I spent sometimes today to clean-up the PoC and sent a series on the ML
> (see [1]). This has been lightly tested so far.
> 
> Would you be able to give a try and let me know if it helps your problem?

Yes of course! I will start to test this series ^^ Thank you for your work!

> 
> For convenience, I have pushed a branch with the series applied here:
> 
> https://xenbits.xen.org/gitweb/?p=people/julieng/xen-
> unstable.git;a=shortlog;h=refs/heads/pt/rfc-v2
> 

Great, thanks!

> Cheers,
> 
> [1] https://lore.kernel.org/xen-devel/20210425201318.15447-1-
> julien@xen.org/
> 
> --
> Julien Grall


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Discussion of Xenheap problems on AArch64
  2021-04-27  6:29       ` Henry Wang
@ 2021-04-28  9:28         ` Henry Wang
  2021-04-28 12:46           ` Julien Grall
  0 siblings, 1 reply; 15+ messages in thread
From: Henry Wang @ 2021-04-28  9:28 UTC (permalink / raw)
  To: Julien Grall, sstabellini, xen-devel
  Cc: Wei Chen, Penny Zheng, Bertrand Marquis

Hi Julien,

I've done some test about the patch series in
https://xenbits.xen.org/gitweb/?p=people/julieng/xen-unstable.git;a=shortlog;h=refs/heads/pt/rfc-v2

If you have time, could you please take a look at the inline test result and
kindly inform if I tested the patch series correctly? Thanks!

> -----Original Message-----
> From: Henry Wang
> Sent: Tuesday, April 27, 2021 2:29 PM
> To: Julien Grall <julien@xen.org>; sstabellini@kernel.org; xen-
> devel@lists.xenproject.org
> Cc: Wei Chen <Wei.Chen@arm.com>; Penny Zheng
> <Penny.Zheng@arm.com>; Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: RE: Discussion of Xenheap problems on AArch64
> 
> Hi Julien,
> 
> Sorry for the late reply, I kinda missed this email somehow....
> 
> Please see my inline reply ^^
> 
> > -----Original Message-----
> > From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of
> > Julien Grall
> > Sent: Monday, April 26, 2021 4:20 AM
> > To: Henry Wang <Henry.Wang@arm.com>; sstabellini@kernel.org; xen-
> > devel@lists.xenproject.org
> > Cc: Wei Chen <Wei.Chen@arm.com>; Penny Zheng
> > <Penny.Zheng@arm.com>; Bertrand Marquis
> <Bertrand.Marquis@arm.com>
> > Subject: Re: Discussion of Xenheap problems on AArch64
> >
> >
> >
> > On 21/04/2021 10:32, Henry Wang wrote:
> > > Hi Julien,
> >
> > Hi Henry,
> >
> > >> -----Original Message-----
> > >> From: Julien Grall <julien@xen.org>
> > >> Sent: Wednesday, April 21, 2021 5:04 PM
> > >> To: Henry Wang <Henry.Wang@arm.com>; sstabellini@kernel.org; xen-
> > >> devel@lists.xenproject.org
> > >> Cc: Wei Chen <Wei.Chen@arm.com>; Penny Zheng
> > >> <Penny.Zheng@arm.com>; Bertrand Marquis
> > <Bertrand.Marquis@arm.com>
> > >> Subject: Re: Discussion of Xenheap problems on AArch64
> > >>
> > >>
> > >>
> > >> On 21/04/2021 07:28, Henry Wang wrote:
> > >>> Hi,
> > >>
> > >> Hi Henry,
> > >>
> > >>>
> > >>> We are trying to implement the static memory allocation on AArch64.
> > Part
> > >> of
> > >>> this feature is the reserved heap memory allocation, where a specific
> > range
> > >> of
> > >>> memory is reserved only for heap. In the development process, we
> > found a
> > >>> pitfall in current AArch64 setup_xenheap_mappings() function.
> > >>>
> > >>> According to a previous discussion in community
> > >>> https://lore.kernel.org/xen-devel/20190216134456.10681-1-
> > >> peng.fan@nxp.com/,
> > >>> on AArch64, bootmem is initialized after setup_xenheap_mappings(),
> > >>> setup_xenheap_mappings() may try to allocate memory before
> memory
> > >> has been
> > >>> handed over to the boot allocator. If the reserved heap memory
> > allocation
> > >> is
> > >>> introduced, either of below 2 cases will trigger a crash:
> > >>>
> > >>> 1. If the reserved heap memory is at the end of the memory block list
> > and
> > >> the
> > >>> gap between reserved and unreserved memory is bigger than 512GB,
> > when
> > >> we setup
> > >>> mappings from the beginning of the memory block list, we will get
> OOM
> > >> caused
> > >>> by lack of pages in boot allocator. This is because the memory that is
> > >> reserved
> > >>> for heap has not been mapped and added to the boot allocator.
> > >>>
> > >>> 2. If we add the memory that is reserved for heap to boot allocator first,
> > >> and
> > >>> then setup mappings for banks in the memory block list, we may get a
> > page
> > >> which
> > >>> has not been setup mapping, causing a data abort.
> > >>
> > >> There are a few issues with setup_xenheap_mappings(). I have been
> > >> reworking the code on my spare time and started to upstream bits of it.
> > >> A PoC can be found here:
> > >>
> > >> https://xenbits.xen.org/gitweb/?p=people/julieng/xen-
> > >> unstable.git;a=shortlog;h=refs/heads/pt/dev
> > >>
> > >
> > > Really great news! Thanks you very much for the information and your
> > hard
> > > work on the PoC :) I will start to go through your PoC code then.
> >
> > I spent sometimes today to clean-up the PoC and sent a series on the ML
> > (see [1]). This has been lightly tested so far.
> >
> > Would you be able to give a try and let me know if it helps your problem?
> 
> Yes of course! I will start to test this series ^^ Thank you for your work!
> 

Test platform: FVP_Base_RevC_2xAEMvA (with -C bp.dram_size=1024)

Default memory configuration (works well):
memory@80000000 {
                device_type = "memory";
                reg = <0x00 0x80000000 0x00 0x7f000000 0x08 0x80000000 0x00 0x80000000>;
};

As the lowest part of DRAM range only has 2GB RAM 
(https://developer.arm.com/documentation/100964/1114/Base-Platform/Base---memory/Base-Platform-memory-map),
I only tested two memory banks with a big gap case.

1. Without patch (commit bea65a212c0581520203b6ad0d07615693f42f73)
and use two memory banks which have a big gap:

Memory node:
memory@80000000 {
                device_type = "memory";
                reg = <0x00 0x80000000 0x00 0x7f000000 0x8800 0x00000000 0x00 0x80000000>;
};

Log:
(XEN)   VTCR_EL2: 80000000
(XEN)  VTTBR_EL2: 0000000000000000
(XEN)
(XEN)  SCTLR_EL2: 30cd183d
(XEN)    HCR_EL2: 0000000000000038
(XEN)  TTBR0_EL2: 000000008413d000
(XEN)
(XEN)    ESR_EL2: 96000041
(XEN)  HPFAR_EL2: 0000000000000000
(XEN)    FAR_EL2: 00008010c3fff000
(XEN) Xen call trace:
(XEN)    [<000000000025c7a0>] clear_page+0x10/0x2c (PC)
(XEN)    [<00000000002caa30>] setup_frametable_mappings+0x1ac/0x2e0 (LR)
(XEN)    [<00000000002cbf34>] start_xen+0x348/0xbc4
(XEN)    [<00000000002001c0>] arm64/head.o#primary_switched+0x10/0x30
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) CPU0: Unexpected Trap: Data Abort
(XEN) ****************************************

2. Apply patch and use two memory banks which have a big gap:
Memory node:
memory@80000000 {
                device_type = "memory";
                reg = <0x00 0x80000000 0x00 0x7f000000 0x8800 0x00000000 0x00 0x80000000>;
};

Log:
(XEN)   VTCR_EL2: 80000000
(XEN)  VTTBR_EL2: 0000000000000000
(XEN)
(XEN)  SCTLR_EL2: 30cd183d
(XEN)    HCR_EL2: 0000000000000038
(XEN)  TTBR0_EL2: 000000008413c000
(XEN)
(XEN)    ESR_EL2: 96000043
(XEN)  HPFAR_EL2: 0000000000000000
(XEN)    FAR_EL2: 0000000000443000
(XEN)
(XEN) Xen call trace:
(XEN)    [<000000000025c7a0>] clear_page+0x10/0x2c (PC)
(XEN)    [<000000000026cf9c>] mm.c#xen_pt_update+0x1b8/0x7b0 (LR)
(XEN)    [<00000000002ca298>] setup_xenheap_mappings+0xb4/0x134
(XEN)    [<00000000002cc1b0>] start_xen+0xb6c/0xbcc
(XEN)    [<00000000002001c0>] arm64/head.o#primary_switched+0x10/0x30
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) CPU0: Unexpected Trap: Data Abort
(XEN) ****************************************

Kind regards,
Henry

> >
> > For convenience, I have pushed a branch with the series applied here:
> >
> > https://xenbits.xen.org/gitweb/?p=people/julieng/xen-
> > unstable.git;a=shortlog;h=refs/heads/pt/rfc-v2
> >
> 
> Great, thanks!
> 
> > Cheers,
> >
> > [1] https://lore.kernel.org/xen-devel/20210425201318.15447-1-
> > julien@xen.org/
> >
> > --
> > Julien Grall


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Discussion of Xenheap problems on AArch64
  2021-04-28  9:28         ` Henry Wang
@ 2021-04-28 12:46           ` Julien Grall
  2021-05-07  4:06             ` Henry Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Julien Grall @ 2021-04-28 12:46 UTC (permalink / raw)
  To: Henry Wang, sstabellini, xen-devel
  Cc: Wei Chen, Penny Zheng, Bertrand Marquis

On 28/04/2021 10:28, Henry Wang wrote:
> Hi Julien,

Hi Henry,

> 
> I've done some test about the patch series in
> https://xenbits.xen.org/gitweb/?p=people/julieng/xen-unstable.git;a=shortlog;h=refs/heads/pt/rfc-v2
> 

Thanks you for the testing. Some questions below.

> Log:
> (XEN)   VTCR_EL2: 80000000
> (XEN)  VTTBR_EL2: 0000000000000000
> (XEN)
> (XEN)  SCTLR_EL2: 30cd183d
> (XEN)    HCR_EL2: 0000000000000038
> (XEN)  TTBR0_EL2: 000000008413d000
> (XEN)
> (XEN)    ESR_EL2: 96000041
> (XEN)  HPFAR_EL2: 0000000000000000
> (XEN)    FAR_EL2: 00008010c3fff000
> (XEN) Xen call trace:
> (XEN)    [<000000000025c7a0>] clear_page+0x10/0x2c (PC)
> (XEN)    [<00000000002caa30>] setup_frametable_mappings+0x1ac/0x2e0 (LR)
> (XEN)    [<00000000002cbf34>] start_xen+0x348/0xbc4
> (XEN)    [<00000000002001c0>] arm64/head.o#primary_switched+0x10/0x30
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) CPU0: Unexpected Trap: Data Abort
> (XEN) ****************************************
> 
> 2. Apply patch and use two memory banks which have a big gap:
> Memory node:
> memory@80000000 {
>                  device_type = "memory";
>                  reg = <0x00 0x80000000 0x00 0x7f000000 0x8800 0x00000000 0x00 0x80000000>;
> };
> 
> Log:
> (XEN)   VTCR_EL2: 80000000
> (XEN)  VTTBR_EL2: 0000000000000000
> (XEN)
> (XEN)  SCTLR_EL2: 30cd183d
> (XEN)    HCR_EL2: 0000000000000038
> (XEN)  TTBR0_EL2: 000000008413c000
> (XEN)
> (XEN)    ESR_EL2: 96000043
> (XEN)  HPFAR_EL2: 0000000000000000
> (XEN)    FAR_EL2: 0000000000443000
> (XEN)
> (XEN) Xen call trace:
> (XEN)    [<000000000025c7a0>] clear_page+0x10/0x2c (PC)
> (XEN)    [<000000000026cf9c>] mm.c#xen_pt_update+0x1b8/0x7b0 (LR)
> (XEN)    [<00000000002ca298>] setup_xenheap_mappings+0xb4/0x134
> (XEN)    [<00000000002cc1b0>] start_xen+0xb6c/0xbcc
> (XEN)    [<00000000002001c0>] arm64/head.o#primary_switched+0x10/0x30
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) CPU0: Unexpected Trap: Data Abort
> (XEN) ****************************************

I am a bit confused with the output with and without my patches. Both of 
them are showing a data abort in clear_page().

Above, you suggested that there is a big gap between the two memory 
banks. Are the banks still point to actual RAM?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Discussion of Xenheap problems on AArch64
  2021-04-28 12:46           ` Julien Grall
@ 2021-05-07  4:06             ` Henry Wang
  2021-05-10 16:58               ` Julien Grall
  0 siblings, 1 reply; 15+ messages in thread
From: Henry Wang @ 2021-05-07  4:06 UTC (permalink / raw)
  To: Julien Grall, sstabellini, xen-devel
  Cc: Wei Chen, Penny Zheng, Bertrand Marquis

Hi Julien,

> From: Julien Grall <julien@xen.org>
> On 28/04/2021 10:28, Henry Wang wrote:
> > Hi Julien,
> 
> Hi Henry,
> 
> >
> > I've done some test about the patch series in
> > https://xenbits.xen.org/gitweb/?p=people/julieng/xen-
> unstable.git;a=shortlog;h=refs/heads/pt/rfc-v2
> >
> 
> Thanks you for the testing. Some questions below.
> 
> I am a bit confused with the output with and without my patches. Both of
> them are showing a data abort in clear_page().
> 
> Above, you suggested that there is a big gap between the two memory
> banks. Are the banks still point to actual RAM?

Another sorry for the very late reply, we had a 5 day public holiday in 
China and it also took me some time to figure out how to configure the 
FVP (it turned out I have to set -C bp.secure_memory=false to access 
some parts of memory higher than 4G).

Yes you are absolutely right. In my previous test, the higher memory is 
not valid. By turning off FVP secure memory, this time I tried 2 test cases:

1. Using reg = <0x00 0x80000000 0x00 0x7f000000 0xf8 0x00000000 0x00 0x80000000>;

In this case, the guest can be successfully booted.

2. Using reg = <0x00 0x80000000 0x00 0x7f000000 0xf9 0x00000000 0x00 0x80000000>;

Firstly I confirmed the memory is valid by using md command in u-boot command line:

VExpress64# md 0xf900000000
f900000000: dfdfdfcf cfdfdfdf dfdfdfcf cfdfdfdf    ................
VExpress64# md 0xf980000000
f980000000: dfdfdfcf cfdfdfdf dfdfdfcf cfdfdfdf    ................

when I continue booting Xen, I got following error log:

(XEN) CPU:    0
(XEN) PC:     00000000002b5a5c alloc_boot_pages+0x94/0x98
(XEN) LR:     00000000002ca3bc
(XEN) SP:     00000000002ffde0
(XEN) CPSR:   600003c9 MODE:64-bit EL2h (Hypervisor, handler)
(XEN)
(XEN)   VTCR_EL2: 80000000
(XEN)  VTTBR_EL2: 0000000000000000
(XEN)
(XEN)  SCTLR_EL2: 30cd183d
(XEN)    HCR_EL2: 0000000000000038
(XEN)  TTBR0_EL2: 000000008413c000
(XEN)
(XEN)    ESR_EL2: f2000001
(XEN)  HPFAR_EL2: 0000000000000000
(XEN)    FAR_EL2: 0000000000000000
(XEN)
(XEN) Xen call trace:
(XEN)    [<00000000002b5a5c>] alloc_boot_pages+0x94/0x98 (PC)
(XEN)    [<00000000002ca3bc>] setup_frametable_mappings+0xa4/0x108 (LR)
(XEN)    [<00000000002ca3bc>] setup_frametable_mappings+0xa4/0x108
(XEN)    [<00000000002cb988>] start_xen+0x344/0xbcc
(XEN)    [<00000000002001c0>] arm64/head.o#primary_switched+0x10/0x30
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Xen BUG at page_alloc.c:432
(XEN) ****************************************

We can continue our discussion from here. Thanks ^^

Kind regards,

Henry

> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Discussion of Xenheap problems on AArch64
  2021-05-07  4:06             ` Henry Wang
@ 2021-05-10 16:58               ` Julien Grall
  2021-05-11  1:11                 ` Henry Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Julien Grall @ 2021-05-10 16:58 UTC (permalink / raw)
  To: Henry Wang, sstabellini, xen-devel
  Cc: Wei Chen, Penny Zheng, Bertrand Marquis

Hi Henry,

On 07/05/2021 05:06, Henry Wang wrote:
>> From: Julien Grall <julien@xen.org>
>> On 28/04/2021 10:28, Henry Wang wrote:
>>> Hi Julien,
>>
>> Hi Henry,
>>
>>>
>>> I've done some test about the patch series in
>>> https://xenbits.xen.org/gitweb/?p=people/julieng/xen-
>> unstable.git;a=shortlog;h=refs/heads/pt/rfc-v2
>>>
>>
>> Thanks you for the testing. Some questions below.
>>
>> I am a bit confused with the output with and without my patches. Both of
>> them are showing a data abort in clear_page().
>>
>> Above, you suggested that there is a big gap between the two memory
>> banks. Are the banks still point to actual RAM?
> 
> Another sorry for the very late reply, we had a 5 day public holiday in
> China and it also took me some time to figure out how to configure the
> FVP (it turned out I have to set -C bp.secure_memory=false to access
> some parts of memory higher than 4G).

No worries. I never tried to tweak the memory layout on the FVP before. 
It is good to know it can be done to properly test memory issue :).

[...]

> when I continue booting Xen, I got following error log:
> 
> (XEN) CPU:    0
> (XEN) PC:     00000000002b5a5c alloc_boot_pages+0x94/0x98
> (XEN) LR:     00000000002ca3bc
> (XEN) SP:     00000000002ffde0
> (XEN) CPSR:   600003c9 MODE:64-bit EL2h (Hypervisor, handler)
> (XEN)
> (XEN)   VTCR_EL2: 80000000
> (XEN)  VTTBR_EL2: 0000000000000000
> (XEN)
> (XEN)  SCTLR_EL2: 30cd183d
> (XEN)    HCR_EL2: 0000000000000038
> (XEN)  TTBR0_EL2: 000000008413c000
> (XEN)
> (XEN)    ESR_EL2: f2000001
> (XEN)  HPFAR_EL2: 0000000000000000
> (XEN)    FAR_EL2: 0000000000000000
> (XEN)
> (XEN) Xen call trace:
> (XEN)    [<00000000002b5a5c>] alloc_boot_pages+0x94/0x98 (PC)
> (XEN)    [<00000000002ca3bc>] setup_frametable_mappings+0xa4/0x108 (LR)
> (XEN)    [<00000000002ca3bc>] setup_frametable_mappings+0xa4/0x108
> (XEN)    [<00000000002cb988>] start_xen+0x344/0xbcc
> (XEN)    [<00000000002001c0>] arm64/head.o#primary_switched+0x10/0x30
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Xen BUG at page_alloc.c:432
> (XEN) ****************************************

This is happening without my patch series applied, right? If so, what 
happen if you apply it?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Discussion of Xenheap problems on AArch64
  2021-05-10 16:58               ` Julien Grall
@ 2021-05-11  1:11                 ` Henry Wang
  2021-05-13 18:18                   ` Julien Grall
  0 siblings, 1 reply; 15+ messages in thread
From: Henry Wang @ 2021-05-11  1:11 UTC (permalink / raw)
  To: Julien Grall, sstabellini, xen-devel
  Cc: Wei Chen, Penny Zheng, Bertrand Marquis

Hi Julien,

> From: Julien Grall <julien@xen.org>
> Hi Henry,
> 
> On 07/05/2021 05:06, Henry Wang wrote:
> >> From: Julien Grall <julien@xen.org>
> >> On 28/04/2021 10:28, Henry Wang wrote:
> [...]
> 
> > when I continue booting Xen, I got following error log:
> >
> > (XEN) CPU:    0
> > (XEN) PC:     00000000002b5a5c alloc_boot_pages+0x94/0x98
> > (XEN) LR:     00000000002ca3bc
> > (XEN) SP:     00000000002ffde0
> > (XEN) CPSR:   600003c9 MODE:64-bit EL2h (Hypervisor, handler)
> > (XEN)
> > (XEN)   VTCR_EL2: 80000000
> > (XEN)  VTTBR_EL2: 0000000000000000
> > (XEN)
> > (XEN)  SCTLR_EL2: 30cd183d
> > (XEN)    HCR_EL2: 0000000000000038
> > (XEN)  TTBR0_EL2: 000000008413c000
> > (XEN)
> > (XEN)    ESR_EL2: f2000001
> > (XEN)  HPFAR_EL2: 0000000000000000
> > (XEN)    FAR_EL2: 0000000000000000
> > (XEN)
> > (XEN) Xen call trace:
> > (XEN)    [<00000000002b5a5c>] alloc_boot_pages+0x94/0x98 (PC)
> > (XEN)    [<00000000002ca3bc>] setup_frametable_mappings+0xa4/0x108
> (LR)
> > (XEN)    [<00000000002ca3bc>] setup_frametable_mappings+0xa4/0x108
> > (XEN)    [<00000000002cb988>] start_xen+0x344/0xbcc
> > (XEN)    [<00000000002001c0>]
> arm64/head.o#primary_switched+0x10/0x30
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 0:
> > (XEN) Xen BUG at page_alloc.c:432
> > (XEN) ****************************************
> 
> This is happening without my patch series applied, right? If so, what
> happen if you apply it?

No, I am afraid this is with your patch series applied, and that is why I
am a little bit confused about the error log...

Thanks for your patience.

Kind regards,

Henry

> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Discussion of Xenheap problems on AArch64
  2021-05-11  1:11                 ` Henry Wang
@ 2021-05-13 18:18                   ` Julien Grall
  2021-05-14  4:35                     ` Henry Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Julien Grall @ 2021-05-13 18:18 UTC (permalink / raw)
  To: Henry Wang, sstabellini, xen-devel
  Cc: Wei Chen, Penny Zheng, Bertrand Marquis



On 11/05/2021 02:11, Henry Wang wrote:
> Hi Julien,

Hi Henry,

> 
>> From: Julien Grall <julien@xen.org>
>> Hi Henry,
>>
>> On 07/05/2021 05:06, Henry Wang wrote:
>>>> From: Julien Grall <julien@xen.org>
>>>> On 28/04/2021 10:28, Henry Wang wrote:
>> [...]
>>
>>> when I continue booting Xen, I got following error log:
>>>
>>> (XEN) CPU:    0
>>> (XEN) PC:     00000000002b5a5c alloc_boot_pages+0x94/0x98
>>> (XEN) LR:     00000000002ca3bc
>>> (XEN) SP:     00000000002ffde0
>>> (XEN) CPSR:   600003c9 MODE:64-bit EL2h (Hypervisor, handler)
>>> (XEN)
>>> (XEN)   VTCR_EL2: 80000000
>>> (XEN)  VTTBR_EL2: 0000000000000000
>>> (XEN)
>>> (XEN)  SCTLR_EL2: 30cd183d
>>> (XEN)    HCR_EL2: 0000000000000038
>>> (XEN)  TTBR0_EL2: 000000008413c000
>>> (XEN)
>>> (XEN)    ESR_EL2: f2000001
>>> (XEN)  HPFAR_EL2: 0000000000000000
>>> (XEN)    FAR_EL2: 0000000000000000
>>> (XEN)
>>> (XEN) Xen call trace:
>>> (XEN)    [<00000000002b5a5c>] alloc_boot_pages+0x94/0x98 (PC)
>>> (XEN)    [<00000000002ca3bc>] setup_frametable_mappings+0xa4/0x108
>> (LR)
>>> (XEN)    [<00000000002ca3bc>] setup_frametable_mappings+0xa4/0x108
>>> (XEN)    [<00000000002cb988>] start_xen+0x344/0xbcc
>>> (XEN)    [<00000000002001c0>]
>> arm64/head.o#primary_switched+0x10/0x30
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) Xen BUG at page_alloc.c:432
>>> (XEN) ****************************************
>>
>> This is happening without my patch series applied, right? If so, what
>> happen if you apply it?
> 
> No, I am afraid this is with your patch series applied, and that is why I
> am a little bit confused about the error log...

You are hitting the BUG() at the end of alloc_boot_pages(). This is hit 
because the boot allocator couldn't allocate memory for your request.

Would you be able to apply the following diff and paste the output here?

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index ace6333c18ea..dbb736fdb275 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -329,6 +329,8 @@ void __init init_boot_pages(paddr_t ps, paddr_t pe)
      if ( pe <= ps )
          return;

+    printk("%s: ps %"PRI_paddr" pe %"PRI_paddr"\n", __func__, ps, pe);
+
      first_valid_mfn = mfn_min(maddr_to_mfn(ps), first_valid_mfn);

      bootmem_region_add(ps >> PAGE_SHIFT, pe >> PAGE_SHIFT);
@@ -395,6 +397,8 @@ mfn_t __init alloc_boot_pages(unsigned long nr_pfns, 
unsigned long pfn_align)
      unsigned long pg, _e;
      unsigned int i = nr_bootmem_regions;

+    printk("%s: nr_pfns %lu pfn_align %lu\n", __func__, nr_pfns, 
pfn_align);
+
      BUG_ON(!nr_bootmem_regions);

      while ( i-- )

Cheers,

-- 
Julien Grall


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* RE: Discussion of Xenheap problems on AArch64
  2021-05-13 18:18                   ` Julien Grall
@ 2021-05-14  4:35                     ` Henry Wang
  2021-05-15 19:11                       ` Julien Grall
  0 siblings, 1 reply; 15+ messages in thread
From: Henry Wang @ 2021-05-14  4:35 UTC (permalink / raw)
  To: Julien Grall, sstabellini, xen-devel
  Cc: Wei Chen, Penny Zheng, Bertrand Marquis

> From: Julien Grall <julien@xen.org>
Hi Julien,

> 
> On 11/05/2021 02:11, Henry Wang wrote:
> > Hi Julien,
> Hi Henry,
> >
> >> From: Julien Grall <julien@xen.org>
> >> Hi Henry,
> >>
> >> On 07/05/2021 05:06, Henry Wang wrote:
> >>>> From: Julien Grall <julien@xen.org>
> >>>> On 28/04/2021 10:28, Henry Wang wrote:
> >> [...]
> >>
> >>> when I continue booting Xen, I got following error log:
> >>>
> >>> (XEN) Xen call trace:
> >>> (XEN)    [<00000000002b5a5c>] alloc_boot_pages+0x94/0x98 (PC)
> >>> (XEN)    [<00000000002ca3bc>] setup_frametable_mappings+0xa4/0x108
> >> (LR)
> >>> (XEN)    [<00000000002ca3bc>] setup_frametable_mappings+0xa4/0x108
> >>> (XEN)    [<00000000002cb988>] start_xen+0x344/0xbcc
> >>> (XEN)    [<00000000002001c0>]
> >> arm64/head.o#primary_switched+0x10/0x30
> >>> (XEN)
> >>> (XEN) ****************************************
> >>> (XEN) Panic on CPU 0:
> >>> (XEN) Xen BUG at page_alloc.c:432
> >>> (XEN) ****************************************
> >>
> >> This is happening without my patch series applied, right? If so, what
> >> happen if you apply it?
> >
> > No, I am afraid this is with your patch series applied, and that is why I
> > am a little bit confused about the error log...
> 
> You are hitting the BUG() at the end of alloc_boot_pages(). This is hit
> because the boot allocator couldn't allocate memory for your request.
> 
> Would you be able to apply the following diff and paste the output here?

Thank you, of course yes, please see below output attached :)

> 
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index ace6333c18ea..dbb736fdb275 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -329,6 +329,8 @@ void __init init_boot_pages(paddr_t ps, paddr_t pe)
>       if ( pe <= ps )
>           return;
> 
> +    printk("%s: ps %"PRI_paddr" pe %"PRI_paddr"\n", __func__, ps, pe);
                                              ^ FYI: I have to change this PRI_paddr to PRIpaddr
                                                 to make compiler happy

> +
>       first_valid_mfn = mfn_min(maddr_to_mfn(ps), first_valid_mfn);
> 
>       bootmem_region_add(ps >> PAGE_SHIFT, pe >> PAGE_SHIFT);
> @@ -395,6 +397,8 @@ mfn_t __init alloc_boot_pages(unsigned long nr_pfns,
> unsigned long pfn_align)
>       unsigned long pg, _e;
>       unsigned int i = nr_bootmem_regions;
> 
> +    printk("%s: nr_pfns %lu pfn_align %lu\n", __func__, nr_pfns,
> pfn_align);
> +
>       BUG_ON(!nr_bootmem_regions);
> 
>       while ( i-- )
> 

I also added some printk to make sure the dtb is parsed correctly, and for the
Error case, I get following log:

(XEN) ----------banks=2--------
(XEN) ----------start=80000000--------
(XEN) ----------size=7F000000--------
(XEN) ----------start=F900000000--------
(XEN) ----------size=80000000--------
(XEN) Checking for initrd in /chosen
(XEN) RAM: 0000000080000000 - 00000000feffffff
(XEN) RAM: 000000f900000000 - 000000f97fffffff
(XEN)
(XEN) MODULE[0]: 0000000084000000 - 00000000841464c8 Xen
(XEN) MODULE[1]: 00000000841464c8 - 0000000084148c9b Device Tree
(XEN) MODULE[2]: 0000000080080000 - 0000000081080000 Kernel
(XEN)  RESVD[0]: 0000000080000000 - 0000000080010000
(XEN)
(XEN) Command line: noreboot dom0_mem=1024M console=dtuart 
dtuart=serial0 bootscrub=0
(XEN) PFN compression on bits 21...22
(XEN) init_boot_pages: ps 0000000080010000 pe 0000000080080000
(XEN) init_boot_pages: ps 0000000081080000 pe 0000000084000000
(XEN) init_boot_pages: ps 0000000084149000 pe 00000000ff000000
(XEN) alloc_boot_pages: nr_pfns 1 pfn_align 1
(XEN) alloc_boot_pages: nr_pfns 1 pfn_align 1
(XEN) alloc_boot_pages: nr_pfns 1 pfn_align 1
(XEN) init_boot_pages: ps 000000f900000000 pe 000000f980000000
(XEN) alloc_boot_pages: nr_pfns 909312 pfn_align 8192
(XEN) Xen BUG at page_alloc.c:436

To compare with the maximum start address (f800000000) of second part mem
where xen boots correctly, I also attached the log for your information:

(XEN) ----------banks=2--------
(XEN) ----------start=80000000--------
(XEN) ----------size=7F000000--------
(XEN) ----------start=F800000000--------
(XEN) ----------size=80000000--------
(XEN) Checking for initrd in /chosen
(XEN) RAM: 0000000080000000 - 00000000feffffff
(XEN) RAM: 000000f800000000 - 000000f87fffffff
(XEN)
(XEN) MODULE[0]: 0000000084000000 - 00000000841464c8 Xen
(XEN) MODULE[1]: 00000000841464c8 - 0000000084148c9b Device Tree
(XEN) MODULE[2]: 0000000080080000 - 0000000081080000 Kernel
(XEN)  RESVD[0]: 0000000080000000 - 0000000080010000
(XEN)
(XEN) Command line: noreboot dom0_mem=1024M console=dtuart
dtuart=serial0 bootscrub=0
(XEN) PFN compression on bits 20...22
(XEN) init_boot_pages: ps 0000000080010000 pe 0000000080080000
(XEN) init_boot_pages: ps 0000000081080000 pe 0000000084000000
(XEN) init_boot_pages: ps 0000000084149000 pe 00000000ff000000
(XEN) alloc_boot_pages: nr_pfns 1 pfn_align 1
(XEN) alloc_boot_pages: nr_pfns 1 pfn_align 1
(XEN) alloc_boot_pages: nr_pfns 1 pfn_align 1
(XEN) init_boot_pages: ps 000000f800000000 pe 000000f880000000
(XEN) alloc_boot_pages: nr_pfns 450560 pfn_align 8192
(XEN) alloc_boot_pages: nr_pfns 1 pfn_align 1
(...A lot of (XEN) alloc_boot_pages: nr_pfns 1 pfn_align 1...)
(XEN) Domain heap initialised
(XEN) Booting using Device Tree

Hope these can help. Thank you.

Kind regards,

Henry

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Discussion of Xenheap problems on AArch64
  2021-05-14  4:35                     ` Henry Wang
@ 2021-05-15 19:11                       ` Julien Grall
  2021-05-17  6:38                         ` Henry Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Julien Grall @ 2021-05-15 19:11 UTC (permalink / raw)
  To: Henry Wang, sstabellini, xen-devel
  Cc: Wei Chen, Penny Zheng, Bertrand Marquis

Hi Henry,

On 14/05/2021 05:35, Henry Wang wrote:
>> From: Julien Grall <julien@xen.org>
> Hi Julien,
> 
>>
>> On 11/05/2021 02:11, Henry Wang wrote:
>>> Hi Julien,
>> Hi Henry,
>>>
>>>> From: Julien Grall <julien@xen.org>
>>>> Hi Henry,
>>>>
>>>> On 07/05/2021 05:06, Henry Wang wrote:
>>>>>> From: Julien Grall <julien@xen.org>
>>>>>> On 28/04/2021 10:28, Henry Wang wrote:
>>>> [...]
>>>>
>>>>> when I continue booting Xen, I got following error log:
>>>>>
>>>>> (XEN) Xen call trace:
>>>>> (XEN)    [<00000000002b5a5c>] alloc_boot_pages+0x94/0x98 (PC)
>>>>> (XEN)    [<00000000002ca3bc>] setup_frametable_mappings+0xa4/0x108
>>>> (LR)
>>>>> (XEN)    [<00000000002ca3bc>] setup_frametable_mappings+0xa4/0x108
>>>>> (XEN)    [<00000000002cb988>] start_xen+0x344/0xbcc
>>>>> (XEN)    [<00000000002001c0>]
>>>> arm64/head.o#primary_switched+0x10/0x30
>>>>> (XEN)
>>>>> (XEN) ****************************************
>>>>> (XEN) Panic on CPU 0:
>>>>> (XEN) Xen BUG at page_alloc.c:432
>>>>> (XEN) ****************************************
>>>>
>>>> This is happening without my patch series applied, right? If so, what
>>>> happen if you apply it?
>>>
>>> No, I am afraid this is with your patch series applied, and that is why I
>>> am a little bit confused about the error log...
>>
>> You are hitting the BUG() at the end of alloc_boot_pages(). This is hit
>> because the boot allocator couldn't allocate memory for your request.
>>
>> Would you be able to apply the following diff and paste the output here?
> 
> Thank you, of course yes, please see below output attached :)
> 
>>
>> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
>> index ace6333c18ea..dbb736fdb275 100644
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -329,6 +329,8 @@ void __init init_boot_pages(paddr_t ps, paddr_t pe)
>>        if ( pe <= ps )
>>            return;
>>
>> +    printk("%s: ps %"PRI_paddr" pe %"PRI_paddr"\n", __func__, ps, pe);
>                                                ^ FYI: I have to change this PRI_paddr to PRIpaddr
>                                                   to make compiler happy

Ah yes, we don't have a variant with _. I thought compiled test before 
sending it :(.

> 
>> +
>>        first_valid_mfn = mfn_min(maddr_to_mfn(ps), first_valid_mfn);
>>
>>        bootmem_region_add(ps >> PAGE_SHIFT, pe >> PAGE_SHIFT);
>> @@ -395,6 +397,8 @@ mfn_t __init alloc_boot_pages(unsigned long nr_pfns,
>> unsigned long pfn_align)
>>        unsigned long pg, _e;
>>        unsigned int i = nr_bootmem_regions;
>>
>> +    printk("%s: nr_pfns %lu pfn_align %lu\n", __func__, nr_pfns,
>> pfn_align);
>> +
>>        BUG_ON(!nr_bootmem_regions);
>>
>>        while ( i-- )
>>
> 
> I also added some printk to make sure the dtb is parsed correctly, and for the
> Error case, I get following log:

Thank you for the log.

> 
> (XEN) ----------banks=2--------
> (XEN) ----------start=80000000--------
> (XEN) ----------size=7F000000--------
> (XEN) ----------start=F900000000--------
> (XEN) ----------size=80000000--------
> (XEN) Checking for initrd in /chosen
> (XEN) RAM: 0000000080000000 - 00000000feffffff
> (XEN) RAM: 000000f900000000 - 000000f97fffffff
> (XEN)
> (XEN) MODULE[0]: 0000000084000000 - 00000000841464c8 Xen
> (XEN) MODULE[1]: 00000000841464c8 - 0000000084148c9b Device Tree
> (XEN) MODULE[2]: 0000000080080000 - 0000000081080000 Kernel
> (XEN)  RESVD[0]: 0000000080000000 - 0000000080010000
> (XEN)
> (XEN) Command line: noreboot dom0_mem=1024M console=dtuart
> dtuart=serial0 bootscrub=0
> (XEN) PFN compression on bits 21...22
> (XEN) init_boot_pages: ps 0000000080010000 pe 0000000080080000

The size of this region is 448MB.

> (XEN) init_boot_pages: ps 0000000081080000 pe 0000000084000000

The size of this region is 47MB.

> (XEN) init_boot_pages: ps 0000000084149000 pe 00000000ff000000

The size of this region is 1966MB.


> (XEN) alloc_boot_pages: nr_pfns 1 pfn_align 1
> (XEN) alloc_boot_pages: nr_pfns 1 pfn_align 1
> (XEN) alloc_boot_pages: nr_pfns 1 pfn_align 1
> (XEN) init_boot_pages: ps 000000f900000000 pe 000000f980000000

The size of this region is 2048MB.

> (XEN) alloc_boot_pages: nr_pfns 909312 pfn_align 8192

This is asking for 3552MB of contiguous memory which cannot be 
accommodated. In any case, this is quite a large region to ask.

Same...

> (XEN) Xen BUG at page_alloc.c:436
> 
> To compare with the maximum start address (f800000000) of second part mem
> where xen boots correctly, I also attached the log for your information:
> 
> (XEN) ----------banks=2--------
> (XEN) ----------start=80000000--------
> (XEN) ----------size=7F000000--------
> (XEN) ----------start=F800000000--------
> (XEN) ----------size=80000000--------
> (XEN) Checking for initrd in /chosen
> (XEN) RAM: 0000000080000000 - 00000000feffffff
> (XEN) RAM: 000000f800000000 - 000000f87fffffff
> (XEN)
> (XEN) MODULE[0]: 0000000084000000 - 00000000841464c8 Xen
> (XEN) MODULE[1]: 00000000841464c8 - 0000000084148c9b Device Tree
> (XEN) MODULE[2]: 0000000080080000 - 0000000081080000 Kernel
> (XEN)  RESVD[0]: 0000000080000000 - 0000000080010000
> (XEN)
> (XEN) Command line: noreboot dom0_mem=1024M console=dtuart
> dtuart=serial0 bootscrub=0
> (XEN) PFN compression on bits 20...22
> (XEN) init_boot_pages: ps 0000000080010000 pe 0000000080080000
> (XEN) init_boot_pages: ps 0000000081080000 pe 0000000084000000
> (XEN) init_boot_pages: ps 0000000084149000 pe 00000000ff000000
> (XEN) alloc_boot_pages: nr_pfns 1 pfn_align 1
> (XEN) alloc_boot_pages: nr_pfns 1 pfn_align 1
> (XEN) alloc_boot_pages: nr_pfns 1 pfn_align 1
> (XEN) init_boot_pages: ps 000000f800000000 pe 000000f880000000
> (XEN) alloc_boot_pages: nr_pfns 450560 pfn_align 8192

... here. We are trying to allocate a 1.5GB frametable. You have only 
4GB of memory so the frametable should be a lot smaller (few tens of MB).

This is happening because PDX is not able to find many bits to compress.
I am not sure we can compress more with the current PDX algorithm. This 
may require some extensive improvement to reduce the footprint.

On a previous e-mail, you said you tweaked the FVP model to set those 
regions. Were you trying to mimick the memory layout of a real HW 
(either current or future)?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Discussion of Xenheap problems on AArch64
  2021-05-15 19:11                       ` Julien Grall
@ 2021-05-17  6:38                         ` Henry Wang
  2021-05-18 14:09                           ` Julien Grall
  0 siblings, 1 reply; 15+ messages in thread
From: Henry Wang @ 2021-05-17  6:38 UTC (permalink / raw)
  To: Julien Grall, sstabellini, xen-devel
  Cc: Wei Chen, Penny Zheng, Bertrand Marquis


> From: Julien Grall <julien@xen.org>
Hi Julien,

> Hi Henry,
> 
> >>>> [...]
> 
> Ah yes, we don't have a variant with _. I thought compiled test before
> sending it :(.

No worries :)

> 
> >
> > (XEN) ----------banks=2--------
> > (XEN) ----------start=80000000--------
> > (XEN) ----------size=7F000000--------
> > (XEN) ----------start=F900000000--------
> > (XEN) ----------size=80000000--------
> > (XEN) Checking for initrd in /chosen
> > (XEN) RAM: 0000000080000000 - 00000000feffffff
> > (XEN) RAM: 000000f900000000 - 000000f97fffffff
> > (XEN)
> > (XEN) MODULE[0]: 0000000084000000 - 00000000841464c8 Xen
> > (XEN) MODULE[1]: 00000000841464c8 - 0000000084148c9b Device Tree
> > (XEN) MODULE[2]: 0000000080080000 - 0000000081080000 Kernel
> > (XEN)  RESVD[0]: 0000000080000000 - 0000000080010000
> > (XEN)
> > (XEN) Command line: noreboot dom0_mem=1024M console=dtuart
> > dtuart=serial0 bootscrub=0
> > (XEN) PFN compression on bits 21...22
> > (XEN) init_boot_pages: ps 0000000080010000 pe 0000000080080000
> 
> The size of this region is 448MB.
> 
> > (XEN) init_boot_pages: ps 0000000081080000 pe 0000000084000000
> 
> The size of this region is 47MB.
> 
> > (XEN) init_boot_pages: ps 0000000084149000 pe 00000000ff000000
> 
> The size of this region is 1966MB.
> 
> 
> > (XEN) alloc_boot_pages: nr_pfns 1 pfn_align 1
> > (XEN) alloc_boot_pages: nr_pfns 1 pfn_align 1
> > (XEN) alloc_boot_pages: nr_pfns 1 pfn_align 1
> > (XEN) init_boot_pages: ps 000000f900000000 pe 000000f980000000
> 
> The size of this region is 2048MB.
> 
> > (XEN) alloc_boot_pages: nr_pfns 909312 pfn_align 8192
> 
> This is asking for 3552MB of contiguous memory which cannot be
> accommodated. In any case, this is quite a large region to ask.
> 
> Same...
> 
> > (XEN) Xen BUG at page_alloc.c:436
> >
> > To compare with the maximum start address (f800000000) of second part
> mem
> > where xen boots correctly, I also attached the log for your information:
> >
> > (XEN) ----------banks=2--------
> > (XEN) ----------start=80000000--------
> > (XEN) ----------size=7F000000--------
> > (XEN) ----------start=F800000000--------
> > (XEN) ----------size=80000000--------
> > (XEN) Checking for initrd in /chosen
> > (XEN) RAM: 0000000080000000 - 00000000feffffff
> > (XEN) RAM: 000000f800000000 - 000000f87fffffff
> > (XEN)
> > (XEN) MODULE[0]: 0000000084000000 - 00000000841464c8 Xen
> > (XEN) MODULE[1]: 00000000841464c8 - 0000000084148c9b Device Tree
> > (XEN) MODULE[2]: 0000000080080000 - 0000000081080000 Kernel
> > (XEN)  RESVD[0]: 0000000080000000 - 0000000080010000
> > (XEN)
> > (XEN) Command line: noreboot dom0_mem=1024M console=dtuart
> > dtuart=serial0 bootscrub=0
> > (XEN) PFN compression on bits 20...22
> > (XEN) init_boot_pages: ps 0000000080010000 pe 0000000080080000
> > (XEN) init_boot_pages: ps 0000000081080000 pe 0000000084000000
> > (XEN) init_boot_pages: ps 0000000084149000 pe 00000000ff000000
> > (XEN) alloc_boot_pages: nr_pfns 1 pfn_align 1
> > (XEN) alloc_boot_pages: nr_pfns 1 pfn_align 1
> > (XEN) alloc_boot_pages: nr_pfns 1 pfn_align 1
> > (XEN) init_boot_pages: ps 000000f800000000 pe 000000f880000000
> > (XEN) alloc_boot_pages: nr_pfns 450560 pfn_align 8192
> 
> ... here. We are trying to allocate a 1.5GB frametable. You have only
> 4GB of memory so the frametable should be a lot smaller (few tens of MB).
> 
> This is happening because PDX is not able to find many bits to compress.
> I am not sure we can compress more with the current PDX algorithm. This
> may require some extensive improvement to reduce the footprint.

Yes you are right, then I don't have any more questions. Thanks very much
for the detailed explanation.

> 
> On a previous e-mail, you said you tweaked the FVP model to set those
> regions. Were you trying to mimick the memory layout of a real HW
> (either current or future)?

Not really, I was just trying to cover as many cases as possible and these
regions were just picked for testing your patchset in different scenarios.

As the issue is related to the PDX algorithm instead of the heap allocation,
and the "allocating a big heap or two heap banks with a big gap" is tested,
I think this patchset is perfect ^^ Thank you.

Kind regards,

Henry

> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Discussion of Xenheap problems on AArch64
  2021-05-17  6:38                         ` Henry Wang
@ 2021-05-18 14:09                           ` Julien Grall
  0 siblings, 0 replies; 15+ messages in thread
From: Julien Grall @ 2021-05-18 14:09 UTC (permalink / raw)
  To: Henry Wang, sstabellini, xen-devel
  Cc: Wei Chen, Penny Zheng, Bertrand Marquis

Hi Henry,

On 17/05/2021 07:38, Henry Wang wrote:
> 
>> From: Julien Grall <julien@xen.org>
>> On a previous e-mail, you said you tweaked the FVP model to set those
>> regions. Were you trying to mimick the memory layout of a real HW
>> (either current or future)?
> 
> Not really, I was just trying to cover as many cases as possible and these
> regions were just picked for testing your patchset in different scenarios.

Thanks for the confirmation. It means we don't have to fix it right now. :).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-05-18 14:09 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-21  6:28 Discussion of Xenheap problems on AArch64 Henry Wang
2021-04-21  9:03 ` Julien Grall
2021-04-21  9:32   ` Henry Wang
2021-04-25 20:19     ` Julien Grall
2021-04-27  6:29       ` Henry Wang
2021-04-28  9:28         ` Henry Wang
2021-04-28 12:46           ` Julien Grall
2021-05-07  4:06             ` Henry Wang
2021-05-10 16:58               ` Julien Grall
2021-05-11  1:11                 ` Henry Wang
2021-05-13 18:18                   ` Julien Grall
2021-05-14  4:35                     ` Henry Wang
2021-05-15 19:11                       ` Julien Grall
2021-05-17  6:38                         ` Henry Wang
2021-05-18 14:09                           ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.