All of lore.kernel.org
 help / color / mirror / Atom feed
* Kexec & Memory Zones question
@ 2011-05-04 18:35 Sujit V
  2011-05-10  9:42 ` WANG Cong
  2011-05-11 15:09 ` Vivek Goyal
  0 siblings, 2 replies; 6+ messages in thread
From: Sujit V @ 2011-05-04 18:35 UTC (permalink / raw)
  To: kexec

On our x86_64 NUMA hardware running linux 2.6.23 with two memory nodes
 have the following zone layout
DMA   0 - 16MB
DMA32   16MB to 4GB
NORMAL 4GB to 96GB

We had the crashkernel boot param as 128M@16M.
I am using kexec-tools-2.0.0

We observed that the reserved crashkernel memory was getting used by
the system.
We found out by kdb  that around 17MB of physical memory the memory
contents were changing which proved that the system was
using the memory which is actually reserved for crash kernel.
[ If I load crash kernel using kexec then the system would crash. The
back trace was always in the megasas driver. ]

My Question is
1) Should the crashkernel memory be located past the DMA32 zone.?

I have tried the following
(1) crashkernel=128M@4GB  ( So the memory reservation is past DMA32)
In this scenario the kexec tools gave an error "Could not find a free
area of memory of xyz bytes"

(2) I changed the max of DMA32 to 1GB
     crashkernel=128M@1G
Still kexec gave the same error "Could not find a free area of memory
of xyz bytes"

Is there any specific restriction on where the crashkernel memory
could be located.?

Is it ok to be in the DMA32 region OR it should be beyond the DMA32 region.

Thanks

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Kexec & Memory Zones question
  2011-05-04 18:35 Kexec & Memory Zones question Sujit V
@ 2011-05-10  9:42 ` WANG Cong
  2011-05-11 15:09 ` Vivek Goyal
  1 sibling, 0 replies; 6+ messages in thread
From: WANG Cong @ 2011-05-10  9:42 UTC (permalink / raw)
  To: kexec

On Wed, 04 May 2011 11:35:46 -0700, Sujit V wrote:

> On our x86_64 NUMA hardware running linux 2.6.23 with two memory nodes
>  have the following zone layout
> DMA   0 - 16MB
> DMA32   16MB to 4GB
> NORMAL 4GB to 96GB
> 
> We had the crashkernel boot param as 128M@16M. I am using
> kexec-tools-2.0.0
> 
> We observed that the reserved crashkernel memory was getting used by the
> system.
> We found out by kdb  that around 17MB of physical memory the memory
> contents were changing which proved that the system was using the memory
> which is actually reserved for crash kernel. [ If I load crash kernel
> using kexec then the system would crash. The back trace was always in
> the megasas driver. ]
> 
> My Question is
> 1) Should the crashkernel memory be located past the DMA32 zone.?


Your DMA32 zone looks extremely large, probably due to you enabled NUMA? 
What does dmesg say if possible? Also, what is your kernel version?

> 
> I have tried the following
> (1) crashkernel=128M@4GB  ( So the memory reservation is past DMA32) In
> this scenario the kexec tools gave an error "Could not find a free area
> of memory of xyz bytes"
> 
> (2) I changed the max of DMA32 to 1GB
>      crashkernel=128M@1G
> Still kexec gave the same error "Could not find a free area of memory of
> xyz bytes"
> 
> Is there any specific restriction on where the crashkernel memory could
> be located.?

No, have you tried "crashkernel=128M@0"? Can that reserve the memory
successfully?

> 
> Is it ok to be in the DMA32 region OR it should be beyond the DMA32
> region.
> 


The reserve memory is unseen by the first kernel nor it should be used,
even if it is in DMA32 zone.



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Kexec & Memory Zones question
  2011-05-04 18:35 Kexec & Memory Zones question Sujit V
  2011-05-10  9:42 ` WANG Cong
@ 2011-05-11 15:09 ` Vivek Goyal
  2011-05-12 10:03   ` WANG Cong
  1 sibling, 1 reply; 6+ messages in thread
From: Vivek Goyal @ 2011-05-11 15:09 UTC (permalink / raw)
  To: Sujit V; +Cc: kexec

On Wed, May 04, 2011 at 11:35:46AM -0700, Sujit V wrote:
> On our x86_64 NUMA hardware running linux 2.6.23 with two memory nodes
>  have the following zone layout
> DMA   0 - 16MB
> DMA32   16MB to 4GB
> NORMAL 4GB to 96GB
> 
> We had the crashkernel boot param as 128M@16M.
> I am using kexec-tools-2.0.0
> 
> We observed that the reserved crashkernel memory was getting used by
> the system.
> We found out by kdb  that around 17MB of physical memory the memory
> contents were changing which proved that the system was
> using the memory which is actually reserved for crash kernel.
> [ If I load crash kernel using kexec then the system would crash. The
> back trace was always in the megasas driver. ]

This is problematic. This memory has been reserved by kernel in early
boot and nobody else should be using this memory. This sounds like a
but somewhere.

> 
> My Question is
> 1) Should the crashkernel memory be located past the DMA32 zone.?
> 
> I have tried the following
> (1) crashkernel=128M@4GB  ( So the memory reservation is past DMA32)
> In this scenario the kexec tools gave an error "Could not find a free
> area of memory of xyz bytes"
> 
> (2) I changed the max of DMA32 to 1GB
>      crashkernel=128M@1G
> Still kexec gave the same error "Could not find a free area of memory
> of xyz bytes"

We have discussed this in the past and due to various reasons the max
amount of RAM you can boot your kernel from seems to be 896MB for
x86_64 and 512MB for 32bit. I shall have to open a previous thread
with hpa to get exact numbers. So loading kernel even higher is not
the solution.

We need to figure out who is using this memory and where is the bug
and fix it.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Kexec & Memory Zones question
  2011-05-11 15:09 ` Vivek Goyal
@ 2011-05-12 10:03   ` WANG Cong
  2011-05-18  2:05     ` Sujit V
  0 siblings, 1 reply; 6+ messages in thread
From: WANG Cong @ 2011-05-12 10:03 UTC (permalink / raw)
  To: kexec

On Wed, 11 May 2011 11:09:08 -0400, Vivek Goyal wrote:

> We have discussed this in the past and due to various reasons the max
> amount of RAM you can boot your kernel from seems to be 896MB for x86_64
> and 512MB for 32bit. I shall have to open a previous thread with hpa to
> get exact numbers. So loading kernel even higher is not the solution.
> 

On the kexec-tools side, I think the limit is hard-coded,

./include/x86/x86-linux.h:250:#define DEFAULT_INITRD_ADDR_MAX 0x37FFFFFF

but we have,

        initrd_addr_max = DEFAULT_INITRD_ADDR_MAX;
        if (real_mode->protocol_version >= 0x0203) {
                initrd_addr_max = real_mode->initrd_addr_max;
                dbgprintf("initrd_addr_max is 0x%lx\n", initrd_addr_max);
        }


so, from the code, initrd_addr_max can be provided by the bootloader.

I remember on the kernel side there's also such a limit, but I can't
find where it is. I am wondering what prevents us from increasing this 
limit to 4G on i386 and even higher on x86_64.

Thanks.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Kexec & Memory Zones question
  2011-05-12 10:03   ` WANG Cong
@ 2011-05-18  2:05     ` Sujit V
  2011-05-18  2:40       ` WANG Cong
  0 siblings, 1 reply; 6+ messages in thread
From: Sujit V @ 2011-05-18  2:05 UTC (permalink / raw)
  To: WANG Cong; +Cc: kexec

We found the root cause for this issue in the bootmem allocator.

The 96GB NUMA system has two memory nodes each with 48GB.
node 0 had zone dma, dma32 & normal
node 1 had only zone normal.

During the early boot i.e kernel/setup.c
The bootmem allocator uses the API find_free_area from the e820 map to
allocate some of its data structures.[ i.e the bitmap ]
(The bootmem bitmap is used to track free & used pages with 1bit for
4K page. The reserve_bootmem() API is used to reserve)

The amount of memory required to represent the bitmap for node 0 with
48GB is. (48GB / (4K * 8)) = 1.5MB

The start address of the free area of size 1.5 MB returned by e820 map was
>> bitmap starts at  PA (0xf9b000) size 1.5MB
0xf9b000 + 1.5 MB = 17.13MB

The bootmem bitmap used the 1.13MB section from the supposed
crashkernel reserved area.
Later when boot param parsing looks at the crashkernel=128M@16M and
reserves the area using the reserve_bootmem().


Later when paging_init() is called the bootmem allocator is retired.
At this point it free's the memory allocated to the bitmap & gives it
to the system page allocator.
i.e pages from 16MB to 17.13 MB are given to the system page
allocator. (Even though the page is reserved by crashkernel.  ]

So pages in this memory range were given some system resources.
When kexec loaded the kdump kernel in the 128M@16M range it corrupted
that memory & we saw the system crash.

I fixed the boot mem allocator and then things worked correctly.


Ours is a 2.6.23 kernel.
The later versions of the kernel have some other mechanism for early
memory reservation (like early_res & memblock)


Thanks





On Thu, May 12, 2011 at 3:03 AM, WANG Cong <xiyou.wangcong@gmail.com> wrote:
> On Wed, 11 May 2011 11:09:08 -0400, Vivek Goyal wrote:
>
>> We have discussed this in the past and due to various reasons the max
>> amount of RAM you can boot your kernel from seems to be 896MB for x86_64
>> and 512MB for 32bit. I shall have to open a previous thread with hpa to
>> get exact numbers. So loading kernel even higher is not the solution.
>>
>
> On the kexec-tools side, I think the limit is hard-coded,
>
> ./include/x86/x86-linux.h:250:#define DEFAULT_INITRD_ADDR_MAX 0x37FFFFFF
>
> but we have,
>
>        initrd_addr_max = DEFAULT_INITRD_ADDR_MAX;
>        if (real_mode->protocol_version >= 0x0203) {
>                initrd_addr_max = real_mode->initrd_addr_max;
>                dbgprintf("initrd_addr_max is 0x%lx\n", initrd_addr_max);
>        }
>
>
> so, from the code, initrd_addr_max can be provided by the bootloader.
>
> I remember on the kernel side there's also such a limit, but I can't
> find where it is. I am wondering what prevents us from increasing this
> limit to 4G on i386 and even higher on x86_64.
>
> Thanks.
>
>
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
>

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Kexec & Memory Zones question
  2011-05-18  2:05     ` Sujit V
@ 2011-05-18  2:40       ` WANG Cong
  0 siblings, 0 replies; 6+ messages in thread
From: WANG Cong @ 2011-05-18  2:40 UTC (permalink / raw)
  To: kexec

On Tue, 17 May 2011 19:05:13 -0700, Sujit V wrote:

> We found the root cause for this issue in the bootmem allocator.
> 
> The 96GB NUMA system has two memory nodes each with 48GB. node 0 had
> zone dma, dma32 & normal
> node 1 had only zone normal.
> 
> During the early boot i.e kernel/setup.c The bootmem allocator uses the
> API find_free_area from the e820 map to allocate some of its data
> structures.[ i.e the bitmap ] (The bootmem bitmap is used to track free
> & used pages with 1bit for 4K page. The reserve_bootmem() API is used to
> reserve)
> 
> The amount of memory required to represent the bitmap for node 0 with
> 48GB is. (48GB / (4K * 8)) = 1.5MB
> 
> The start address of the free area of size 1.5 MB returned by e820 map
> was
>>> bitmap starts at  PA (0xf9b000) size 1.5MB
> 0xf9b000 + 1.5 MB = 17.13MB
> 
> The bootmem bitmap used the 1.13MB section from the supposed crashkernel
> reserved area.
> Later when boot param parsing looks at the crashkernel=128M@16M and
> reserves the area using the reserve_bootmem().
> 
> 
> Later when paging_init() is called the bootmem allocator is retired. At
> this point it free's the memory allocated to the bitmap & gives it to
> the system page allocator.
> i.e pages from 16MB to 17.13 MB are given to the system page allocator.
> (Even though the page is reserved by crashkernel.  ]
> 
> So pages in this memory range were given some system resources. When
> kexec loaded the kdump kernel in the 128M@16M range it corrupted that
> memory & we saw the system crash.
> 
> I fixed the boot mem allocator and then things worked correctly.


Yes, this is a bug of bootmem allocator. Before switching to memblock,
the old bootmem allocator marks the crashkernel as exclusive, which
means it should use any memory area used by others, thus in this case
crashkernel memory reservation should fail.

> 
> 
> Ours is a 2.6.23 kernel.
> The later versions of the kernel have some other mechanism for early
> memory reservation (like early_res & memblock)
> 

Right, I think that version of kernel is still using the old bootmem 
allocator, so you can change the crashkernel reservation to be 
exclusively.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-05-18  2:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-04 18:35 Kexec & Memory Zones question Sujit V
2011-05-10  9:42 ` WANG Cong
2011-05-11 15:09 ` Vivek Goyal
2011-05-12 10:03   ` WANG Cong
2011-05-18  2:05     ` Sujit V
2011-05-18  2:40       ` WANG Cong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.