On Wed, Nov 16, 2016 at 10:53:36AM -0600, Alex Thorlton wrote:
> On Wed, Nov 16, 2016 at 05:42:11PM +0100, Juergen Gross wrote:
> > On 15/11/16 08:15, Jan Beulich wrote:
> > >>>> On 15.11.16 at 07:33, <JGross@suse.com> wrote:
> > >> On 15/11/16 01:11, Alex Thorlton wrote:
> > >>> Hey everyone,
> > >>>
> > >>> We're having problems with large systems hitting a BUG in
> > >>> xen_memory_setup, due to extra e820 entries created in the
> > >>> XENMEM_machine_memory_map callback.  The change in the patch gets things
> > >>> working, but Boris and I wanted to get opinions on whether or not this
> > >>> is the appropriate/entire solution, which is why I've sent it as an RFC
> > >>> for now.
> > 
> > >> While I think extending the e820 table is the right thing to do I'm
> > >> questioning the assumptions here.
> > >>
> > >> Looking briefly through the Xen hypervisor sources I think it isn't
> > >> yet ready for such large machines: the hypervisor's e820 map seems to
> > >> be still limited to 128 e820 entries. Jan, did I overlook an EFI
> > >> specific path extending this limitation?
> > > 
> > > No, you didn't. I do question the correlation with "large machines"
> > > here though: The issue isn't with large machines afaict, but with
> > > ones having very many entries (i.e. heavily fragmented).
> > 
> > Alex, I would appreciate if you could send me the E820 map printed
> > at a bare metal Linux boot. I suspect it is already larger than
> > 128 entries and the hypervisor is just cutting it off at the end.
> 
> No problem!  I'll get this to you today.

I've attached a boot log from a 4.9-rc5 kernel.  Sorry this took me a
little longer than I expected!

AFAICT, the kernel running bare metal reports the same 115 e820
entries as the HV was reporting on previous boots.  I believe that our
failure was due to 14 more entries being added during
XENMEM_machine_memory_map.  I have some debug output that shows this, if
you'd like to see it.

- Alex