All of lore.kernel.org
 help / color / mirror / Atom feed
* [CFT] ELF Relocatable x86 and x86_64 bzImages
       [not found]                 ` <m1d5c92yv4.fsf@ebiederm.dsl.xmission.com>
@ 2006-07-31 16:19                   ` Eric W. Biederman
  2006-07-31 20:25                     ` Vivek Goyal
                                       ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Eric W. Biederman @ 2006-07-31 16:19 UTC (permalink / raw)
  To: fastboot
  Cc: Jan Kratochvil, Magnus Damm, Horms, Vivek Goyal, Linda Wang,
	linux-kernel, H. Peter Anvin


I have spent some time and have gotten my relocatable kernel patches
working against the latest kernels.  I intend to push this upstream
shortly.

Could all of the people who care take a look and test this out
to make certain that it doesn't just work on my test box?

My approach is to extend bzImage so that it is an ET_DYN ELF executable
(we have what used to be a bootsector where we can put the header).
Boot loaders are explicitly not expected to process relocations.

The x86_64 kernel is simply built to live at a fixed virtual address
and the boot page tables are relocated.  The i386 kernel is built
to process relocates generated with --embedded-relocs (after vmlinux.lds.S)
has been fixed up to sort out static and dynamic relocations.

Currently there are 33 patches in my tree to do this.

The weirdest symptom I have had so far is that page faults did not
trigger the early exception handler on x86_64 (instead I got a reboot).

The code should be available shortly at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-reloc.git#reloc-v2.6.18-rc3

If all goes well with the testing I will push the patches to Andrew in the next couple 
of days.

Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-07-31 16:19                   ` [CFT] ELF Relocatable x86 and x86_64 bzImages Eric W. Biederman
@ 2006-07-31 20:25                     ` Vivek Goyal
  2006-07-31 21:00                       ` [Fastboot] " Vivek Goyal
  2006-08-04 21:08                     ` Don Zickus
  2006-08-25 20:16                       ` Vivek Goyal
  2 siblings, 1 reply; 46+ messages in thread
From: Vivek Goyal @ 2006-07-31 20:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: fastboot, Jan Kratochvil, Magnus Damm, Horms, Linda Wang,
	linux-kernel, H. Peter Anvin

On Mon, Jul 31, 2006 at 10:19:04AM -0600, Eric W. Biederman wrote:
> 
> I have spent some time and have gotten my relocatable kernel patches
> working against the latest kernels.  I intend to push this upstream
> shortly.
> 
> Could all of the people who care take a look and test this out
> to make certain that it doesn't just work on my test box?
> 
Hi Eric,

Currently I am testing your patches on i386. With CONFIG_RELOCATABLE=y
kernel boots fine and kexec also works.

But my kernel hangs on kexec on panic case. It hangs early in 
decompress_kernel(). Kernel hangs at following condition.

+       if (((u32)output - CONFIG_PHYSICAL_START) & 0x3fffff)
+               error("Destination address not 4M aligned");

I have reserved 64MB at 16M and kernel is loaded at 16M. 

I had expected that I would get "Destination address not 4M aligned" on
serial console but did not happen. Had to put outb() to get to this point.

Will look more into it.

Thanks
Vivek 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-07-31 20:25                     ` Vivek Goyal
@ 2006-07-31 21:00                       ` Vivek Goyal
  2006-08-01  2:31                         ` Eric W. Biederman
  0 siblings, 1 reply; 46+ messages in thread
From: Vivek Goyal @ 2006-07-31 21:00 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm,
	linux-kernel

On Mon, Jul 31, 2006 at 04:25:20PM -0400, Vivek Goyal wrote:
> On Mon, Jul 31, 2006 at 10:19:04AM -0600, Eric W. Biederman wrote:
> > 
> > I have spent some time and have gotten my relocatable kernel patches
> > working against the latest kernels.  I intend to push this upstream
> > shortly.
> > 
> > Could all of the people who care take a look and test this out
> > to make certain that it doesn't just work on my test box?
> > 
> Hi Eric,
> 
> Currently I am testing your patches on i386. With CONFIG_RELOCATABLE=y
> kernel boots fine and kexec also works.
> 
> But my kernel hangs on kexec on panic case. It hangs early in 
> decompress_kernel(). Kernel hangs at following condition.
> 
> +       if (((u32)output - CONFIG_PHYSICAL_START) & 0x3fffff)
> +               error("Destination address not 4M aligned");
> 

Ok. I am decompressing the kernel to 16MB and after reducing 1MB of
CONFIG_PHYSICAL_START I am left with 15MB which is not 4M aligned
hence I seems to be running into it.

I changed it to

if ((u32)output) & 0x3fffff)

and kdump kernel booted fine. But this will run into issues if I load
kernel at 1MB.

I got a dump question. Why do I have to load the kernel at 4MB alignment?
Existing kernel boots loads at 1MB, which is non 4MB aligned and it works
fine?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-07-31 21:00                       ` [Fastboot] " Vivek Goyal
@ 2006-08-01  2:31                         ` Eric W. Biederman
  2006-08-01  2:34                           ` H. Peter Anvin
  2006-08-01  4:25                           ` Jan Kratochvil
  0 siblings, 2 replies; 46+ messages in thread
From: Eric W. Biederman @ 2006-08-01  2:31 UTC (permalink / raw)
  To: vgoyal
  Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm,
	linux-kernel

Vivek Goyal <vgoyal@in.ibm.com> writes:

> On Mon, Jul 31, 2006 at 04:25:20PM -0400, Vivek Goyal wrote:
>> On Mon, Jul 31, 2006 at 10:19:04AM -0600, Eric W. Biederman wrote:
>> > 
>> > I have spent some time and have gotten my relocatable kernel patches
>> > working against the latest kernels.  I intend to push this upstream
>> > shortly.
>> > 
>> > Could all of the people who care take a look and test this out
>> > to make certain that it doesn't just work on my test box?
>> > 
>> Hi Eric,
>> 
>> Currently I am testing your patches on i386. With CONFIG_RELOCATABLE=y
>> kernel boots fine and kexec also works.
>> 
>> But my kernel hangs on kexec on panic case. It hangs early in 
>> decompress_kernel(). Kernel hangs at following condition.
>> 
>> +       if (((u32)output - CONFIG_PHYSICAL_START) & 0x3fffff)
>> +               error("Destination address not 4M aligned");
>> 

As for the missing print.  Did you have an appropriate earlyprintk?

> Ok. I am decompressing the kernel to 16MB and after reducing 1MB of
> CONFIG_PHYSICAL_START I am left with 15MB which is not 4M aligned
> hence I seems to be running into it.
>
> I changed it to
>
> if ((u32)output) & 0x3fffff)
>
> and kdump kernel booted fine. But this will run into issues if I load
> kernel at 1MB.
>
> I got a dump question. Why do I have to load the kernel at 4MB alignment?
> Existing kernel boots loads at 1MB, which is non 4MB aligned and it works
> fine?

4MB is a little harsh, but I haven't worked through what the exact rules
are, I know 4MB is the worst case alignment for arch/i386.

The rule is that we have to be at the same offset from 4MB as we
were built to run at.  So in this case address where (address %4MB) == 1MB.

We might be able to get away with 2MB alignment.  I thought kexec-tools
did that calculation automatically for an ET_DYN image but it has been
a while since I looked.

My goal with the check was to catch problems early before something
bad happened.

Eric


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-01  2:31                         ` Eric W. Biederman
@ 2006-08-01  2:34                           ` H. Peter Anvin
  2006-08-01  3:44                             ` Eric W. Biederman
  2006-08-01  4:25                           ` Jan Kratochvil
  1 sibling, 1 reply; 46+ messages in thread
From: H. Peter Anvin @ 2006-08-01  2:34 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: vgoyal, fastboot, Horms, Jan Kratochvil, Magnus Damm, linux-kernel

Eric W. Biederman wrote:
> 
>> Ok. I am decompressing the kernel to 16MB and after reducing 1MB of
>> CONFIG_PHYSICAL_START I am left with 15MB which is not 4M aligned
>> hence I seems to be running into it.
>>
>> I changed it to
>>
>> if ((u32)output) & 0x3fffff)
>>
>> and kdump kernel booted fine. But this will run into issues if I load
>> kernel at 1MB.
>>
>> I got a dump question. Why do I have to load the kernel at 4MB alignment?
>> Existing kernel boots loads at 1MB, which is non 4MB aligned and it works
>> fine?
> 
> 4MB is a little harsh, but I haven't worked through what the exact rules
> are, I know 4MB is the worst case alignment for arch/i386.
> 

4 MB would be worst case for i386; 2 MB for x86-64.  Actually the x86-64 
worst case would be gigabyte, but that's more than a little bit extreme.

	-hpa

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-01  2:34                           ` H. Peter Anvin
@ 2006-08-01  3:44                             ` Eric W. Biederman
  0 siblings, 0 replies; 46+ messages in thread
From: Eric W. Biederman @ 2006-08-01  3:44 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: vgoyal, fastboot, Horms, Jan Kratochvil, Magnus Damm, linux-kernel

"H. Peter Anvin" <hpa@zytor.com> writes:

> Eric W. Biederman wrote:
>>
>>> Ok. I am decompressing the kernel to 16MB and after reducing 1MB of
>>> CONFIG_PHYSICAL_START I am left with 15MB which is not 4M aligned
>>> hence I seems to be running into it.
>>>
>>> I changed it to
>>>
>>> if ((u32)output) & 0x3fffff)
>>>
>>> and kdump kernel booted fine. But this will run into issues if I load
>>> kernel at 1MB.
>>>
>>> I got a dump question. Why do I have to load the kernel at 4MB alignment?
>>> Existing kernel boots loads at 1MB, which is non 4MB aligned and it works
>>> fine?
>> 4MB is a little harsh, but I haven't worked through what the exact rules
>> are, I know 4MB is the worst case alignment for arch/i386.
>>
>
> 4 MB would be worst case for i386; 2 MB for x86-64.  Actually the x86-64 worst
> case would be gigabyte, but that's more than a little bit extreme.

Yep and that is what a test for, except for the gigabyte case which we don't
currently implement.  Although I can imagine that gigabyte pages might
be interesting for the identity mapped part of the page table.

Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-01  2:31                         ` Eric W. Biederman
  2006-08-01  2:34                           ` H. Peter Anvin
@ 2006-08-01  4:25                           ` Jan Kratochvil
  2006-08-01  9:09                             ` Eric W. Biederman
  1 sibling, 1 reply; 46+ messages in thread
From: Jan Kratochvil @ 2006-08-01  4:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: vgoyal, fastboot, Horms, H. Peter Anvin, Magnus Damm, linux-kernel

On Tue, 01 Aug 2006 04:31:43 +0200, Eric W. Biederman wrote:
...
> 4MB is a little harsh, but I haven't worked through what the exact rules
> are, I know 4MB is the worst case alignment for arch/i386.
> 
> The rule is that we have to be at the same offset from 4MB as we
> were built to run at.  So in this case address where (address %4MB) == 1MB.

In such case your patch is not optimal.  The original VA Linux Japan patch 2.0
	http://mkdump.sourceforge.net/cvs.html
	cvs -q -z3 -d:pserver:anonymous:@mkdump.cvs.sourceforge.net:/cvsroot/mkdump rdiff -u -r bp_linux-2_6-minik -r linux-2_6-minik linux
had lower alignment requirements and these were really tested that time.

i386 had alignment requirement:
	/* current_thread_info()&co. are 8192-alignment fixed (for the initial stack). */
	#if CONFIG_PHYSICAL_START & 0x1FFF
	#error "CONFIG_PHYSICAL_START must be 2*PAGE_SIZE (0x2000) aligned!"
	#endif
as IIRC those i386 2MB/4MB pages must be (apparently) 2MB/4MB aligned in the
virtual address space but their physical target address can be arbitrary.

and x86_64 alignment requirement was:
	#if (CONFIG_PHYSICAL_START - 0x100000) & 0x1FFFFF
	#error "CONFIG_PHYSICAL_START must be '2MB * x + 1MB' aligned!"
	#endif
while IIRC those x86_64 2MB pages need to have even the physical target address
2MB aligned.  Lower alignment would require suboptimal execution by not using
the 2MB pages (and the patch would have to handle it appropriately).

( I did not check your patches as they are locked in that useless GIT anyway. )


Regards,
Lace

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-01  4:25                           ` Jan Kratochvil
@ 2006-08-01  9:09                             ` Eric W. Biederman
  2006-08-01  9:43                               ` Jan Kratochvil
  0 siblings, 1 reply; 46+ messages in thread
From: Eric W. Biederman @ 2006-08-01  9:09 UTC (permalink / raw)
  To: Jan Kratochvil
  Cc: vgoyal, fastboot, Horms, H. Peter Anvin, Magnus Damm, linux-kernel

Jan Kratochvil <lace@jankratochvil.net> writes:

> On Tue, 01 Aug 2006 04:31:43 +0200, Eric W. Biederman wrote:
> ...
>> 4MB is a little harsh, but I haven't worked through what the exact rules
>> are, I know 4MB is the worst case alignment for arch/i386.
>> 
>> The rule is that we have to be at the same offset from 4MB as we
>> were built to run at.  So in this case address where (address %4MB) == 1MB.
>
> In such case your patch is not optimal.  The original VA Linux Japan patch 2.0
> 	http://mkdump.sourceforge.net/cvs.html
> 	cvs -q -z3
> -d:pserver:anonymous:@mkdump.cvs.sourceforge.net:/cvsroot/mkdump rdiff -u -r
> bp_linux-2_6-minik -r linux-2_6-minik linux
> had lower alignment requirements and these were really tested that time.
>
> i386 had alignment requirement:
> 	/* current_thread_info()&co. are 8192-alignment fixed (for the initial
> stack). */
> 	#if CONFIG_PHYSICAL_START & 0x1FFF
> 	#error "CONFIG_PHYSICAL_START must be 2*PAGE_SIZE (0x2000) aligned!"
> 	#endif
> as IIRC those i386 2MB/4MB pages must be (apparently) 2MB/4MB aligned in the
> virtual address space but their physical target address can be arbitrary.

I know you can't use huge pages if your physical address is not
properly aligned.   Which can be a performance impact if nothing else.
Not something I want to encourage in a general purpose kernel. 

If it is actually a problem once we get past the user confusion
aspect of this I will happily revisit it.  The big confusion in all of
this is that with a 4MB alignment and a 1MB offset the useable cases
are: 1MB, 5MB, 9MB, 13MB, 17MB, 21MB... 


What I did that is rather unique is I actually enforce this in misc.c
so there is no way we can slip by our alignment requirements.

I'm not terribly comfortable with the 8K alignment number as we only
tell the linker we need 4K alignment.  So there might be other
implicit things out there as well.  Although I admit head.S may be
the only place we can get away with that kind of thing, as the linker
can move everything else around.  Groan yet another kernel audit if
we go this route.

> and x86_64 alignment requirement was:
> 	#if (CONFIG_PHYSICAL_START - 0x100000) & 0x1FFFFF
> 	#error "CONFIG_PHYSICAL_START must be '2MB * x + 1MB' aligned!"
> 	#endif
> while IIRC those x86_64 2MB pages need to have even the physical target address
> 2MB aligned.  Lower alignment would require suboptimal execution by not using
> the 2MB pages (and the patch would have to handle it appropriately).

Yes. I have that check.  Except now the check really is
(CONFIG_PHYSICAL_START & 0x1FFFFF) == 0 because the x86_64 kernel lives at
2MB by default now, so it can really get the benefit of huge pages.

> ( I did not check your patches as they are locked in that useless GIT anyway. )
( As opposed to the unuseable CVS I presume :)

I guess I should just post them so we can have a sane conversation :)

Eric


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-01  9:09                             ` Eric W. Biederman
@ 2006-08-01  9:43                               ` Jan Kratochvil
  2006-08-01 11:28                                 ` Eric W. Biederman
  0 siblings, 1 reply; 46+ messages in thread
From: Jan Kratochvil @ 2006-08-01  9:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: vgoyal, fastboot, Horms, H. Peter Anvin, Magnus Damm, linux-kernel

On Tue, 01 Aug 2006 11:09:28 +0200, Eric W. Biederman wrote:
> Jan Kratochvil <lace@jankratochvil.net> writes:
...
> > i386 had alignment requirement:
> > 	/* current_thread_info()&co. are 8192-alignment fixed (for the initial
> > stack). */
> > 	#if CONFIG_PHYSICAL_START & 0x1FFF
> > 	#error "CONFIG_PHYSICAL_START must be 2*PAGE_SIZE (0x2000) aligned!"
> > 	#endif
> > as IIRC those i386 2MB/4MB pages must be (apparently) 2MB/4MB aligned in the
> > virtual address space but their physical target address can be arbitrary.
> 
> I know you can't use huge pages if your physical address is not
> properly aligned.   Which can be a performance impact if nothing else.
> Not something I want to encourage in a general purpose kernel. 

So you rather crash than running in that unmeasurably lower performance?

IIRC those 2MB/4MB pages performance "gain" is still present (in my patch)
even if the kernel location is not 2MB/4MB aligned because the i386 2MB/4MB
pagetable entries can have arbitrary physical memory target address.
But maybe I lie here, sorry, I really do not remember it much.
(It 100% worked with the "full performance" if aligned and it "worked" if
unaligned but I do not remember if it worked "full performance" if unaligned.)

...
> I'm not terribly comfortable with the 8K alignment number as we only
> tell the linker we need 4K alignment.

Yes, it should be fixed there so that the stacks get allocated 8KB-aligned not
depending on the kernel code position at all.  That means allocating the
initial stack by code and not relying on its autoallocation by the linker.
There would remain the 4KB alignment requirement due to the physical target
address of the pagetable entries.

...
> > ( I did not check your patches as they are locked in that useless GIT anyway. )
> ( As opposed to the unuseable CVS I presume :)

Yes, it has the same unusability as CVS, just it looses the feature of being
the standard.  I assume some CVS flamewar already occured some time ago.


Regards,
Lace

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-01  9:43                               ` Jan Kratochvil
@ 2006-08-01 11:28                                 ` Eric W. Biederman
  0 siblings, 0 replies; 46+ messages in thread
From: Eric W. Biederman @ 2006-08-01 11:28 UTC (permalink / raw)
  To: Jan Kratochvil
  Cc: vgoyal, fastboot, Horms, H. Peter Anvin, Magnus Damm, linux-kernel

Jan Kratochvil <lace@jankratochvil.net> writes:

> So you rather crash than running in that unmeasurably lower performance?

No simply I would rather not boot than run something I'm not certain will
work.  If we align things deliberately for better performance I don't
want to cope with that either.

> ...
>> I'm not terribly comfortable with the 8K alignment number as we only
>> tell the linker we need 4K alignment.
>
> Yes, it should be fixed there so that the stacks get allocated 8KB-aligned not
> depending on the kernel code position at all.  That means allocating the
> initial stack by code and not relying on its autoallocation by the linker.
> There would remain the 4KB alignment requirement due to the physical target
> address of the pagetable entries.

So thinking about this.  By processing relocations we end up with
no page table related relocation restrictions except that we must be
within the identity mapped page table area.  Not even the 4KB is
directly a page table related alignment restriction.

So the right answer is to review the arch/i386 kernel and make certain
we don't have any implicit alignment requirements, (and if we do
making them explicit so the linker will honor and report them).  At
which point all I need to do is to copy the required alignment from
vmlinux to the ELF header of the bzImage.

>> > ( I did not check your patches as they are locked in that useless GIT
> anyway. )

For code review sending patches is still the best way to do it.
Patches in email are easier to comment on, and require less work
for people to actually look at.  So since you have complained
I have sent out all of the patches.

My evil plan is to keep making interesting things available in GIT
until it is no longer considered useless :)

Eric


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-07-31 16:19                   ` [CFT] ELF Relocatable x86 and x86_64 bzImages Eric W. Biederman
  2006-07-31 20:25                     ` Vivek Goyal
@ 2006-08-04 21:08                     ` Don Zickus
  2006-08-04 21:25                       ` Eric W. Biederman
  2006-08-25 20:16                       ` Vivek Goyal
  2 siblings, 1 reply; 46+ messages in thread
From: Don Zickus @ 2006-08-04 21:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm,
	linux-kernel

On Mon, Jul 31, 2006 at 10:19:04AM -0600, Eric W. Biederman wrote:
> 
> I have spent some time and have gotten my relocatable kernel patches
> working against the latest kernels.  I intend to push this upstream
> shortly.
> 
> Could all of the people who care take a look and test this out
> to make certain that it doesn't just work on my test box?

Is there any reason to get following error on x86_64 using your patches?

 Filesystem type is ext2fs, partition type 0x83
kernel /bzImage ro root=LABEL=/1 console=ttyS0,115200
earlyprintk=ttyS0,115200
   [Linux-bzImage, setup=0x1c00, size=0x24917c]
initrd /initrd-2.6.18-rc3.img
   [Linux-initrd @ 0x37e0d000, 0x1e25e7 bytes]

.
Decompressing Linux...

length error

 -- System halted


I can get i386 to boot fine.  I can't for the life of me figure out what I
am doing wrong..

Cheers,
Don


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-04 21:08                     ` Don Zickus
@ 2006-08-04 21:25                       ` Eric W. Biederman
  2006-08-04 23:43                         ` Don Zickus
  0 siblings, 1 reply; 46+ messages in thread
From: Eric W. Biederman @ 2006-08-04 21:25 UTC (permalink / raw)
  To: Don Zickus
  Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm,
	linux-kernel

Don Zickus <dzickus@redhat.com> writes:

> On Mon, Jul 31, 2006 at 10:19:04AM -0600, Eric W. Biederman wrote:
>> 
>> I have spent some time and have gotten my relocatable kernel patches
>> working against the latest kernels.  I intend to push this upstream
>> shortly.
>> 
>> Could all of the people who care take a look and test this out
>> to make certain that it doesn't just work on my test box?
>
> Is there any reason to get following error on x86_64 using your patches?

There shouldn't be.

>  Filesystem type is ext2fs, partition type 0x83
> kernel /bzImage ro root=LABEL=/1 console=ttyS0,115200
> earlyprintk=ttyS0,115200
>    [Linux-bzImage, setup=0x1c00, size=0x24917c]
> initrd /initrd-2.6.18-rc3.img
>    [Linux-initrd @ 0x37e0d000, 0x1e25e7 bytes]
>
> .
> Decompressing Linux...
>
> length error
>
>  -- System halted
>
>
> I can get i386 to boot fine.  I can't for the life of me figure out what I
> am doing wrong..

The length error comes from lib/inflate.c 

I think it would be interesting to look at orig_len and bytes_out.

My hunch is that I have tripped over a tool chain bug or a weird
alignment issue.

The error is the uncompressed length does not math the stored length
of the data before from before we compressed it.  Now what is
fascinating is that our crc's match (as that check is performed first).

Something is very slightly off and I don't see what it is.

After looking at the state variables I would probably start looking
at the uncompressed data to see if it really was decompressing
properly.  If nothing else that is the kind of process that would tend
to spark a clue.

Eric


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-04 21:25                       ` Eric W. Biederman
@ 2006-08-04 23:43                         ` Don Zickus
  2006-08-05  7:49                           ` Eric W. Biederman
  2006-08-05 16:07                           ` Eric W. Biederman
  0 siblings, 2 replies; 46+ messages in thread
From: Don Zickus @ 2006-08-04 23:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm,
	linux-kernel

> The length error comes from lib/inflate.c 
> 
> I think it would be interesting to look at orig_len and bytes_out.
> 
> My hunch is that I have tripped over a tool chain bug or a weird
> alignment issue.

I thought so too, but I took vmlinuz images from people (Vivek) who had it
boot on their systems but those images still failed on my two machines.  

> 
> The error is the uncompressed length does not math the stored length
> of the data before from before we compressed it.  Now what is
> fascinating is that our crc's match (as that check is performed first).
> 
> Something is very slightly off and I don't see what it is.

I printed out orig_len -> 5910532 (which matches vmlinux.bin)
             bytes_out -> 5910531

> 
> After looking at the state variables I would probably start looking
> at the uncompressed data to see if it really was decompressing
> properly.  If nothing else that is the kind of process that would tend
> to spark a clue.

I am not familiar with the code, so very few sparks are flying.  I'll
still dig through though.  Thanks for the tips.

Cheers,
Don

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-04 23:43                         ` Don Zickus
@ 2006-08-05  7:49                           ` Eric W. Biederman
  2006-08-05 16:07                           ` Eric W. Biederman
  1 sibling, 0 replies; 46+ messages in thread
From: Eric W. Biederman @ 2006-08-05  7:49 UTC (permalink / raw)
  To: Don Zickus
  Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm,
	linux-kernel

Don Zickus <dzickus@redhat.com> writes:

>> The length error comes from lib/inflate.c 
>> 
>> I think it would be interesting to look at orig_len and bytes_out.
>> 
>> My hunch is that I have tripped over a tool chain bug or a weird
>> alignment issue.
>
> I thought so too, but I took vmlinuz images from people (Vivek) who had it
> boot on their systems but those images still failed on my two machines.  

Odd.  That might narrow things down.  This is just booting with grub
so there is no relocation specific weirdness coming into play.

>> The error is the uncompressed length does not math the stored length
>> of the data before from before we compressed it.  Now what is
>> fascinating is that our crc's match (as that check is performed first).
>> 
>> Something is very slightly off and I don't see what it is.
>
> I printed out orig_len -> 5910532 (which matches vmlinux.bin)
>              bytes_out -> 5910531

Is the last byte of vmlinux.bin 0?

One byte off certainly, fits my patter of something slightly off.

>> After looking at the state variables I would probably start looking
>> at the uncompressed data to see if it really was decompressing
>> properly.  If nothing else that is the kind of process that would tend
>> to spark a clue.
>
> I am not familiar with the code, so very few sparks are flying.  I'll
> still dig through though.  Thanks for the tips.

Welcome.

Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-04 23:43                         ` Don Zickus
  2006-08-05  7:49                           ` Eric W. Biederman
@ 2006-08-05 16:07                           ` Eric W. Biederman
  2006-08-07 17:44                             ` Don Zickus
  1 sibling, 1 reply; 46+ messages in thread
From: Eric W. Biederman @ 2006-08-05 16:07 UTC (permalink / raw)
  To: Don Zickus
  Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm,
	linux-kernel

Don Zickus <dzickus@redhat.com> writes:

>> The length error comes from lib/inflate.c 
>> 
>> I think it would be interesting to look at orig_len and bytes_out.
>> 
>> My hunch is that I have tripped over a tool chain bug or a weird
>> alignment issue.
>
> I thought so too, but I took vmlinuz images from people (Vivek) who had it
> boot on their systems but those images still failed on my two machines.  
>
>> 
>> The error is the uncompressed length does not math the stored length
>> of the data before from before we compressed it.  Now what is
>> fascinating is that our crc's match (as that check is performed first).
>> 
>> Something is very slightly off and I don't see what it is.
>
> I printed out orig_len -> 5910532 (which matches vmlinux.bin)
>              bytes_out -> 5910531
>
>> 
>> After looking at the state variables I would probably start looking
>> at the uncompressed data to see if it really was decompressing
>> properly.  If nothing else that is the kind of process that would tend
>> to spark a clue.
>
> I am not familiar with the code, so very few sparks are flying.  I'll
> still dig through though.  Thanks for the tips.

I guess the interesting thing to do would be to 
- Recompute the crc to see if we still match.
- Possibly instrument of flush_window.

I have a strange feeling that the uncompressed data is getting corrupted
after we have flushed the window.

Eric



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-05 16:07                           ` Eric W. Biederman
@ 2006-08-07 17:44                             ` Don Zickus
  2006-08-07 18:08                               ` Eric W. Biederman
  0 siblings, 1 reply; 46+ messages in thread
From: Don Zickus @ 2006-08-07 17:44 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm,
	linux-kernel

On Sat, Aug 05, 2006 at 10:07:01AM -0600, Eric W. Biederman wrote:
> Don Zickus <dzickus@redhat.com> writes:
> 
> >> The length error comes from lib/inflate.c 
> >> 
> >> I think it would be interesting to look at orig_len and bytes_out.
> >> 
> >> My hunch is that I have tripped over a tool chain bug or a weird
> >> alignment issue.
> >
> > I thought so too, but I took vmlinuz images from people (Vivek) who had it
> > boot on their systems but those images still failed on my two machines.  
> >
> >> 
> >> The error is the uncompressed length does not math the stored length
> >> of the data before from before we compressed it.  Now what is
> >> fascinating is that our crc's match (as that check is performed first).
> >> 
> >> Something is very slightly off and I don't see what it is.
> >
> > I printed out orig_len -> 5910532 (which matches vmlinux.bin)
> >              bytes_out -> 5910531
> >
> >> 
> >> After looking at the state variables I would probably start looking
> >> at the uncompressed data to see if it really was decompressing
> >> properly.  If nothing else that is the kind of process that would tend
> >> to spark a clue.
> >
> > I am not familiar with the code, so very few sparks are flying.  I'll
> > still dig through though.  Thanks for the tips.
> 
> I guess the interesting thing to do would be to 
> - Recompute the crc to see if we still match.
> - Possibly instrument of flush_window.
> 
> I have a strange feeling that the uncompressed data is getting corrupted
> after we have flushed the window.

It seems to be an AMD64 vs EM64T problem.  AMD chipsets work but Intel
chipsets don't.  

I also blindly incremented bytes_out (as a really cheap hack), it didn't
work until I added some random putstr's below it (timing??).  Then the
kernel booted. 

Still looking into things.  

Cheers,
Don

> 
> Eric
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-07 17:44                             ` Don Zickus
@ 2006-08-07 18:08                               ` Eric W. Biederman
  2006-08-07 23:57                                 ` Don Zickus
  0 siblings, 1 reply; 46+ messages in thread
From: Eric W. Biederman @ 2006-08-07 18:08 UTC (permalink / raw)
  To: Don Zickus
  Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm,
	linux-kernel

Don Zickus <dzickus@redhat.com> writes:

> On Sat, Aug 05, 2006 at 10:07:01AM -0600, Eric W. Biederman wrote:
>> Don Zickus <dzickus@redhat.com> writes:
>> 
>> >> The length error comes from lib/inflate.c 
>> >> 
>> >> I think it would be interesting to look at orig_len and bytes_out.
>> >> 
>> >> My hunch is that I have tripped over a tool chain bug or a weird
>> >> alignment issue.
>> >
>> > I thought so too, but I took vmlinuz images from people (Vivek) who had it
>> > boot on their systems but those images still failed on my two machines.  
>> >
>> >> 
>> >> The error is the uncompressed length does not math the stored length
>> >> of the data before from before we compressed it.  Now what is
>> >> fascinating is that our crc's match (as that check is performed first).
>> >> 
>> >> Something is very slightly off and I don't see what it is.
>> >
>> > I printed out orig_len -> 5910532 (which matches vmlinux.bin)
>> >              bytes_out -> 5910531
>> >
>> >> 
>> >> After looking at the state variables I would probably start looking
>> >> at the uncompressed data to see if it really was decompressing
>> >> properly.  If nothing else that is the kind of process that would tend
>> >> to spark a clue.
>> >
>> > I am not familiar with the code, so very few sparks are flying.  I'll
>> > still dig through though.  Thanks for the tips.
>> 
>> I guess the interesting thing to do would be to 
>> - Recompute the crc to see if we still match.
>> - Possibly instrument of flush_window.
>> 
>> I have a strange feeling that the uncompressed data is getting corrupted
>> after we have flushed the window.
>
> It seems to be an AMD64 vs EM64T problem.  AMD chipsets work but Intel
> chipsets don't.  
>
> I also blindly incremented bytes_out (as a really cheap hack), it didn't
> work until I added some random putstr's below it (timing??).  Then the
> kernel booted. 
>
> Still looking into things.  

Odd.  I wonder if I'm missing a serializing instruction somewhere,
to ensure the effects of ``self modifying code'' aren't a problem.
As I read Intels Documentation if you have a jump before you get
to the code there shouldn't be a problem.

Still that doesn't really explain bytes_out.


Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-07 18:08                               ` Eric W. Biederman
@ 2006-08-07 23:57                                 ` Don Zickus
  2006-08-08  5:01                                   ` Eric W. Biederman
  2006-08-08 23:36                                   ` Andi Kleen
  0 siblings, 2 replies; 46+ messages in thread
From: Don Zickus @ 2006-08-07 23:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm,
	linux-kernel

> >> >
> >> >> 
> >> >> The error is the uncompressed length does not math the stored length
> >> >> of the data before from before we compressed it.  Now what is
> >> >> fascinating is that our crc's match (as that check is performed first).
> >> >> 
> >> >> Something is very slightly off and I don't see what it is.
> >> >
> >> > I printed out orig_len -> 5910532 (which matches vmlinux.bin)
> >> >              bytes_out -> 5910531
> >> >
> >> >> 
> > It seems to be an AMD64 vs EM64T problem.  AMD chipsets work but Intel
> > chipsets don't.  
> >
> > I also blindly incremented bytes_out (as a really cheap hack), it didn't
> > work until I added some random putstr's below it (timing??).  Then the
> > kernel booted. 
> >
> > Still looking into things.  
> 
> Odd.  I wonder if I'm missing a serializing instruction somewhere,
> to ensure the effects of ``self modifying code'' aren't a problem.
> As I read Intels Documentation if you have a jump before you get
> to the code there shouldn't be a problem.
> 
> Still that doesn't really explain bytes_out.
> 

So I narrowed down the problem but it isn't obvious to me why this problem
exists.  Basically, even though bytes_out is supposed to be initialized to
0, it becomes -1 before entering decompress_kernel().  Of course, the
fallout is in flush_window() bytes_out wounds up being one less than
outcnt and hence my original problem.

Any thoughts on how to debug where this could be getting corrupted?  

Cheers,
Don


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-07 23:57                                 ` Don Zickus
@ 2006-08-08  5:01                                   ` Eric W. Biederman
  2006-08-08 19:36                                     ` Don Zickus
  2006-08-09 20:06                                     ` Don Zickus
  2006-08-08 23:36                                   ` Andi Kleen
  1 sibling, 2 replies; 46+ messages in thread
From: Eric W. Biederman @ 2006-08-08  5:01 UTC (permalink / raw)
  To: Don Zickus
  Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm,
	linux-kernel

Don Zickus <dzickus@redhat.com> writes:

>> >> >
>> >> >> 
>> >> >> The error is the uncompressed length does not math the stored length
>> >> >> of the data before from before we compressed it.  Now what is
>> >> >> fascinating is that our crc's match (as that check is performed first).
>> >> >> 
>> >> >> Something is very slightly off and I don't see what it is.
>> >> >
>> >> > I printed out orig_len -> 5910532 (which matches vmlinux.bin)
>> >> >              bytes_out -> 5910531
>> >> >
>> >> >> 
>> > It seems to be an AMD64 vs EM64T problem.  AMD chipsets work but Intel
>> > chipsets don't.  
>> >
>> > I also blindly incremented bytes_out (as a really cheap hack), it didn't
>> > work until I added some random putstr's below it (timing??).  Then the
>> > kernel booted. 
>> >
>> > Still looking into things.  
>> 
>> Odd.  I wonder if I'm missing a serializing instruction somewhere,
>> to ensure the effects of ``self modifying code'' aren't a problem.
>> As I read Intels Documentation if you have a jump before you get
>> to the code there shouldn't be a problem.
>> 
>> Still that doesn't really explain bytes_out.
>> 
>
> So I narrowed down the problem but it isn't obvious to me why this problem
> exists.  Basically, even though bytes_out is supposed to be initialized to
> 0, it becomes -1 before entering decompress_kernel().  Of course, the
> fallout is in flush_window() bytes_out wounds up being one less than
> outcnt and hence my original problem.
>
> Any thoughts on how to debug where this could be getting corrupted?  

Looking at my build it appears bytes_out is being placed in the .bss.
A little odd since it is zero initialized but no big deal.
Could you confirm that bytes_out is being placed in the .bss section 
by inspecting arch/x86_64/boot/compresssed/misc.o and
arch/x86_64/boot_compressed/vmlinux.   "readelf -a $file" and then
looking up the section number and looking at the section table to see
which section it is was my technique.

If bytes_out is in the .bss for you then I suspect something is not
correctly zeroing the .bss.  Or else the .bss is being stomped.

I'm not certain how rep stosb can be done wrong but some bad pointer
math could have done it.

Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-08  5:01                                   ` Eric W. Biederman
@ 2006-08-08 19:36                                     ` Don Zickus
  2006-08-09 20:06                                     ` Don Zickus
  1 sibling, 0 replies; 46+ messages in thread
From: Don Zickus @ 2006-08-08 19:36 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm,
	linux-kernel

On Mon, Aug 07, 2006 at 11:01:53PM -0600, Eric W. Biederman wrote:
> Don Zickus <dzickus@redhat.com> writes:
> 
> >> >> >
> >> >> >> 
> >> >> >> The error is the uncompressed length does not math the stored length
> >> >> >> of the data before from before we compressed it.  Now what is
> >> >> >> fascinating is that our crc's match (as that check is performed first).
> >> >> >> 
> >> >> >> Something is very slightly off and I don't see what it is.
> >> >> >
> >> >> > I printed out orig_len -> 5910532 (which matches vmlinux.bin)
> >> >> >              bytes_out -> 5910531
> >> >> >
> >> >> >> 
> >> > It seems to be an AMD64 vs EM64T problem.  AMD chipsets work but Intel
> >> > chipsets don't.  
> >> >
> >> > I also blindly incremented bytes_out (as a really cheap hack), it didn't
> >> > work until I added some random putstr's below it (timing??).  Then the
> >> > kernel booted. 
> >> >
> >> > Still looking into things.  
> >> 
> >> Odd.  I wonder if I'm missing a serializing instruction somewhere,
> >> to ensure the effects of ``self modifying code'' aren't a problem.
> >> As I read Intels Documentation if you have a jump before you get
> >> to the code there shouldn't be a problem.
> >> 
> >> Still that doesn't really explain bytes_out.
> >> 
> >
> > So I narrowed down the problem but it isn't obvious to me why this problem
> > exists.  Basically, even though bytes_out is supposed to be initialized to
> > 0, it becomes -1 before entering decompress_kernel().  Of course, the
> > fallout is in flush_window() bytes_out wounds up being one less than
> > outcnt and hence my original problem.
> >
> > Any thoughts on how to debug where this could be getting corrupted?  
> 
> Looking at my build it appears bytes_out is being placed in the .bss.
> A little odd since it is zero initialized but no big deal.
> Could you confirm that bytes_out is being placed in the .bss section 
> by inspecting arch/x86_64/boot/compresssed/misc.o and
> arch/x86_64/boot_compressed/vmlinux.   "readelf -a $file" and then
> looking up the section number and looking at the section table to see
> which section it is was my technique.

Yes bytes_out is in the .bss for both files.  

> 
> If bytes_out is in the .bss for you then I suspect something is not
> correctly zeroing the .bss.  Or else the .bss is being stomped.
> 
> I'm not certain how rep stosb can be done wrong but some bad pointer
> math could have done it.

Even worse, from the time the .bss is cleared to the time gunzip() is
called inside decompress_kernel(), there is very little code to do some
stomping.  

So I am stuck trying to debug this.  This code seems very fragile.  The
more debug code I add (ie putstr) the more the length is off (varies from
-32 to +1).  Makes me scratch my head as to what is really going on here.  

I created a really pathetic patch to get the thing to boot but even that
doesn't make sense.  


diff --git a/arch/x86_64/boot/compressed/misc.c b/arch/x86_64/boot/compressed/misc.c
index 0e6c4b7..614416e 100644
--- a/arch/x86_64/boot/compressed/misc.c
+++ b/arch/x86_64/boot/compressed/misc.c
@@ -183,6 +183,7 @@ #define OLD_CL_MAGIC 0xA33F
 extern unsigned char input_data[];
 extern int input_len;
 
+static long dummy;
 static long bytes_out = 0;
 
 static void *malloc(int size);
@@ -594,6 +595,7 @@ asmlinkage void decompress_kernel(void *
 	if ((ulg)output >= 0xffffffffffUL)
 		error("Destination address too large");
 
+	bytes_out = 0;
 	makecrc();
 	putstr(".\nDecompressing Linux...");
 	gunzip();

And yes, the 'dummy' variable needs to be there.  
I am trying to use gdb on vmlinux to fish for clues.  But I am at a loss
right now.  

Cheers,
Don

> 
> Eric

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-07 23:57                                 ` Don Zickus
  2006-08-08  5:01                                   ` Eric W. Biederman
@ 2006-08-08 23:36                                   ` Andi Kleen
  1 sibling, 0 replies; 46+ messages in thread
From: Andi Kleen @ 2006-08-08 23:36 UTC (permalink / raw)
  To: Don Zickus
  Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm,
	linux-kernel

Don Zickus <dzickus@redhat.com> writes:
> > 
> > Odd.  I wonder if I'm missing a serializing instruction somewhere,
> > to ensure the effects of ``self modifying code'' aren't a problem.
> > As I read Intels Documentation if you have a jump before you get
> > to the code there shouldn't be a problem.
> > 
> > Still that doesn't really explain bytes_out.
> > 

Sounds nasty.

> 
> So I narrowed down the problem but it isn't obvious to me why this problem
> exists.  Basically, even though bytes_out is supposed to be initialized to
> 0, it becomes -1 before entering decompress_kernel().  Of course, the
> fallout is in flush_window() bytes_out wounds up being one less than
> outcnt and hence my original problem.
> 
> Any thoughts on how to debug where this could be getting corrupted?  

Use a simulator (hopefully you can reproduce it in there) like qemu
or AMD SimNow and set a watch point on the address?

Or try to find someone who has a Intel target probe to help you out.

-Andi

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-08  5:01                                   ` Eric W. Biederman
  2006-08-08 19:36                                     ` Don Zickus
@ 2006-08-09 20:06                                     ` Don Zickus
  2006-08-10  6:09                                       ` Eric W. Biederman
  1 sibling, 1 reply; 46+ messages in thread
From: Don Zickus @ 2006-08-09 20:06 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm,
	linux-kernel, vgoyal

> Looking at my build it appears bytes_out is being placed in the .bss.
> A little odd since it is zero initialized but no big deal.
> Could you confirm that bytes_out is being placed in the .bss section 
> by inspecting arch/x86_64/boot/compresssed/misc.o and
> arch/x86_64/boot_compressed/vmlinux.   "readelf -a $file" and then
> looking up the section number and looking at the section table to see
> which section it is was my technique.
> 
> If bytes_out is in the .bss for you then I suspect something is not
> correctly zeroing the .bss.  Or else the .bss is being stomped.
> 
> I'm not certain how rep stosb can be done wrong but some bad pointer
> math could have done it.
> 
> Eric

It seems Vivek came up with a solution that works.  He sent it to me this
morning.  We tested a bunch of machines and things seem to work now.  It
looks like it mimics the i386 behaviour now.

Cheers,
Don

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/boot/compressed/head.S |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff -puN arch/x86_64/boot/compressed/head.S~x86_64-bss-clearing-test
arch/x86_64/boot/compressed/head.S
---
linux-2.6.18-rc3-1M/arch/x86_64/boot/compressed/head.S~x86_64-bss-clearing-test
2006-08-09 09:43:17.000000000 -0400
+++ linux-2.6.18-rc3-1M-root/arch/x86_64/boot/compressed/head.S 2006-08-09
09:43:34.000000000 -0400
@@ -235,8 +235,8 @@ relocated:
 /*
  * Clear BSS
  */
-       movq    $_edata, %rdi
-       movq    $_end, %rcx
+        leaq    _edata(%rbx), %rdi
+        leaq    _end(%rbx), %rcx
        subq    %rdi, %rcx
        cld
        rep
_


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-09 20:06                                     ` Don Zickus
@ 2006-08-10  6:09                                       ` Eric W. Biederman
  2006-08-10 13:13                                         ` Vivek Goyal
  0 siblings, 1 reply; 46+ messages in thread
From: Eric W. Biederman @ 2006-08-10  6:09 UTC (permalink / raw)
  To: Don Zickus
  Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm,
	linux-kernel, vgoyal

Don Zickus <dzickus@redhat.com> writes:

>> Looking at my build it appears bytes_out is being placed in the .bss.
>> A little odd since it is zero initialized but no big deal.
>> Could you confirm that bytes_out is being placed in the .bss section 
>> by inspecting arch/x86_64/boot/compresssed/misc.o and
>> arch/x86_64/boot_compressed/vmlinux.   "readelf -a $file" and then
>> looking up the section number and looking at the section table to see
>> which section it is was my technique.
>> 
>> If bytes_out is in the .bss for you then I suspect something is not
>> correctly zeroing the .bss.  Or else the .bss is being stomped.
>> 
>> I'm not certain how rep stosb can be done wrong but some bad pointer
>> math could have done it.
>> 
>> Eric
>
> It seems Vivek came up with a solution that works.  He sent it to me this
> morning.  We tested a bunch of machines and things seem to work now.  It
> looks like it mimics the i386 behaviour now.

Yes, this looks right.  It looks like I forgot to make this change when
the logic from i386 was adopted to x86_64, ages ago.

This is exactly the place in the code I would have expected a bug
from the symptoms you were seeing.

Thanks all I will include this in my version of the patches.

Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-10  6:09                                       ` Eric W. Biederman
@ 2006-08-10 13:13                                         ` Vivek Goyal
  2006-08-10 17:05                                           ` Eric W. Biederman
  0 siblings, 1 reply; 46+ messages in thread
From: Vivek Goyal @ 2006-08-10 13:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Don Zickus, fastboot, Horms, Jan Kratochvil, H. Peter Anvin,
	Magnus Damm, linux-kernel

On Thu, Aug 10, 2006 at 12:09:56AM -0600, Eric W. Biederman wrote:
> Don Zickus <dzickus@redhat.com> writes:
> 
> >> Looking at my build it appears bytes_out is being placed in the .bss.
> >> A little odd since it is zero initialized but no big deal.
> >> Could you confirm that bytes_out is being placed in the .bss section 
> >> by inspecting arch/x86_64/boot/compresssed/misc.o and
> >> arch/x86_64/boot_compressed/vmlinux.   "readelf -a $file" and then
> >> looking up the section number and looking at the section table to see
> >> which section it is was my technique.
> >> 
> >> If bytes_out is in the .bss for you then I suspect something is not
> >> correctly zeroing the .bss.  Or else the .bss is being stomped.
> >> 
> >> I'm not certain how rep stosb can be done wrong but some bad pointer
> >> math could have done it.
> >> 
> >> Eric
> >
> > It seems Vivek came up with a solution that works.  He sent it to me this
> > morning.  We tested a bunch of machines and things seem to work now.  It
> > looks like it mimics the i386 behaviour now.
> 
> Yes, this looks right.  It looks like I forgot to make this change when
> the logic from i386 was adopted to x86_64, ages ago.
> 
> This is exactly the place in the code I would have expected a bug
> from the symptoms you were seeing.
> 
> Thanks all I will include this in my version of the patches.

Apart from this I think something is still off on x86_64. I have not
been able to make kdump work on x86_64. Second kernel simply hangs.
Two different machines are showing different results.

- On one machine, it seems to be stuck somewhere in decompress_kernel().
  Serial console is not behaving properly even with earlyprintk(). Somehow
  I feel it is some bss corruption even after my changes.

- Other machines seems to be going till start_kernel() and even after
  that (No messages on the console, all serial debugging) and then
  either it hangs or jumps back to BIOS.

Will look more into it.

Thanks
Vivek
 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-10 13:13                                         ` Vivek Goyal
@ 2006-08-10 17:05                                           ` Eric W. Biederman
  2006-08-10 18:18                                             ` Vivek Goyal
  0 siblings, 1 reply; 46+ messages in thread
From: Eric W. Biederman @ 2006-08-10 17:05 UTC (permalink / raw)
  To: vgoyal
  Cc: Don Zickus, fastboot, Horms, Jan Kratochvil, H. Peter Anvin,
	Magnus Damm, linux-kernel

Vivek Goyal <vgoyal@in.ibm.com> writes:

> Apart from this I think something is still off on x86_64. I have not
> been able to make kdump work on x86_64. Second kernel simply hangs.
> Two different machines are showing different results.
>
> - On one machine, it seems to be stuck somewhere in decompress_kernel().
>   Serial console is not behaving properly even with earlyprintk(). Somehow
>   I feel it is some bss corruption even after my changes.
>
> - Other machines seems to be going till start_kernel() and even after
>   that (No messages on the console, all serial debugging) and then
>   either it hangs or jumps back to BIOS.
>
> Will look more into it.

Thanks.

I'm a little disappointed but at this point it isn't a great surprise,
the code is early yet and hasn't had much testing or attention.
I wonder if I have missed something else silly.

As for testing, can you use plain kexec to load the kernel at a
different address?  I'm curious to know if it is something related
to the kexec on panic path or if it is just running at a different
location that is the problem.

I'm back on the namespace stuff this week so it will be a while before
I get back to this.  It doesn't look like I have time to work the whole
patchset at once.  So my current plan is to take as many pieces that
make sense by themselves and push them upstream.  Until we get down to
just the relocatable kernel patches that are outstanding.

Everything was fairly well received on the round of reviews with some
minor nits that needed to be picked.  So I think this is doable.

Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-10 17:05                                           ` Eric W. Biederman
@ 2006-08-10 18:18                                             ` Vivek Goyal
  2006-08-10 20:09                                               ` Eric W. Biederman
  0 siblings, 1 reply; 46+ messages in thread
From: Vivek Goyal @ 2006-08-10 18:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Don Zickus, fastboot, Horms, Jan Kratochvil, H. Peter Anvin,
	Magnus Damm, linux-kernel

On Thu, Aug 10, 2006 at 11:05:22AM -0600, Eric W. Biederman wrote:
> Vivek Goyal <vgoyal@in.ibm.com> writes:
> 
> > Apart from this I think something is still off on x86_64. I have not
> > been able to make kdump work on x86_64. Second kernel simply hangs.
> > Two different machines are showing different results.
> >
> > - On one machine, it seems to be stuck somewhere in decompress_kernel().
> >   Serial console is not behaving properly even with earlyprintk(). Somehow
> >   I feel it is some bss corruption even after my changes.
> >
> > - Other machines seems to be going till start_kernel() and even after
> >   that (No messages on the console, all serial debugging) and then
> >   either it hangs or jumps back to BIOS.
> >
> > Will look more into it.
> 
> Thanks.
> 
> I'm a little disappointed but at this point it isn't a great surprise,
> the code is early yet and hasn't had much testing or attention.
> I wonder if I have missed something else silly.
> 
> As for testing, can you use plain kexec to load the kernel at a
> different address?  I'm curious to know if it is something related
> to the kexec on panic path or if it is just running at a different
> location that is the problem.

Yes. This seems to be minor stuff. Parameter segment seems to be
getting stomped while I am doing decompression. Most probably should
be coming from extra space calculations (32K etc) being done at run
time to find out where should we shift the compressed image.

Kexec works because parameter segment is being loaded below the
compressed image and doest not get stomped over. :-) 

I just reserved memory at non 2MB aligned location 65MB@15MB so that
kernel is loaded at 16MB and other smaller segments below the compressed
image, then I can successfully booted into the kdump kernel.

So basically kexec on panic path seems to be clean except stomping issue.
May be bzImage program header should reflect right "MemSize" which
takes into account extra memory space calculations.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-10 18:18                                             ` Vivek Goyal
@ 2006-08-10 20:09                                               ` Eric W. Biederman
  2006-08-11 21:25                                                 ` Don Zickus
  2006-08-14 16:51                                                 ` [Fastboot] " Vivek Goyal
  0 siblings, 2 replies; 46+ messages in thread
From: Eric W. Biederman @ 2006-08-10 20:09 UTC (permalink / raw)
  To: vgoyal
  Cc: Don Zickus, fastboot, Horms, Jan Kratochvil, H. Peter Anvin,
	Magnus Damm, linux-kernel

Vivek Goyal <vgoyal@in.ibm.com> writes:

> On Thu, Aug 10, 2006 at 11:05:22AM -0600, Eric W. Biederman wrote:
>> Vivek Goyal <vgoyal@in.ibm.com> writes:
>> 
>> > Apart from this I think something is still off on x86_64. I have not
>> > been able to make kdump work on x86_64. Second kernel simply hangs.
>> > Two different machines are showing different results.
>> >
>> > - On one machine, it seems to be stuck somewhere in decompress_kernel().
>> >   Serial console is not behaving properly even with earlyprintk(). Somehow
>> >   I feel it is some bss corruption even after my changes.
>> >
>> > - Other machines seems to be going till start_kernel() and even after
>> >   that (No messages on the console, all serial debugging) and then
>> >   either it hangs or jumps back to BIOS.
>> >
>> > Will look more into it.
>> 
>> Thanks.
>> 
>> I'm a little disappointed but at this point it isn't a great surprise,
>> the code is early yet and hasn't had much testing or attention.
>> I wonder if I have missed something else silly.
>> 
>> As for testing, can you use plain kexec to load the kernel at a
>> different address?  I'm curious to know if it is something related
>> to the kexec on panic path or if it is just running at a different
>> location that is the problem.
>
> Yes. This seems to be minor stuff. Parameter segment seems to be
> getting stomped while I am doing decompression. Most probably should
> be coming from extra space calculations (32K etc) being done at run
> time to find out where should we shift the compressed image.
>
> Kexec works because parameter segment is being loaded below the
> compressed image and doest not get stomped over. :-) 

Ah.  That makes sense.

> I just reserved memory at non 2MB aligned location 65MB@15MB so that
> kernel is loaded at 16MB and other smaller segments below the compressed
> image, then I can successfully booted into the kdump kernel.

:)

> So basically kexec on panic path seems to be clean except stomping issue.
> May be bzImage program header should reflect right "MemSize" which
> takes into account extra memory space calculations.

Yes.  That sounds like the right thing to do.  

I remember trying to compute a good memsize when I created the bzImage
header but it is completely possible I missed some part of the
calculation or assumed that the kernels .bss section would always be
larger than what I needed for decompression.

Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-10 20:09                                               ` Eric W. Biederman
@ 2006-08-11 21:25                                                 ` Don Zickus
  2006-08-12  7:20                                                   ` Eric W. Biederman
  2006-08-14 16:51                                                 ` [Fastboot] " Vivek Goyal
  1 sibling, 1 reply; 46+ messages in thread
From: Don Zickus @ 2006-08-11 21:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: vgoyal, fastboot, Horms, Jan Kratochvil, H. Peter Anvin,
	Magnus Damm, linux-kernel

> >> 
> >> I'm a little disappointed but at this point it isn't a great surprise,
> >> the code is early yet and hasn't had much testing or attention.
> >> I wonder if I have missed something else silly.
> >> 
> >> As for testing, can you use plain kexec to load the kernel at a
> >> different address?  I'm curious to know if it is something related
> >> to the kexec on panic path or if it is just running at a different
> >> location that is the problem.
> >

I think I have found the 'something silly'.  Here is a patch that allows
our Dell em64t boxes to boot.  This change matches the original code.  The
main difference that caused the problems was the setting of _PAGE_NX bit.
This caused issues in early_io_remap().  

Thanks to Larry Woodman for debugging this.  

Cheers,
Don


Signed-off-by:  Don Zickus <dzickus@redhat.com>

--- linux-2.6.17.noarch/arch/x86_64/mm/init.c.orig	2006-08-11 12:35:58.000000000 -0400
+++ linux-2.6.17.noarch/arch/x86_64/mm/init.c	2006-08-11 13:14:20.000000000 -0400
@@ -196,7 +196,7 @@
 		vaddr += addr & ~PMD_MASK;
 		addr &= PMD_MASK;
 		for (i = 0; i < pmds; i++, addr += PMD_SIZE)
-			set_pmd(pmd + i,__pmd(addr | __PAGE_KERNEL_LARGE));
+			set_pmd(pmd + i,__pmd(addr | _KERNPG_TABLE | _PAGE_PSE));
 		__flush_tlb();
 		return (void *)vaddr;
 	next:

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-11 21:25                                                 ` Don Zickus
@ 2006-08-12  7:20                                                   ` Eric W. Biederman
  2006-08-12 15:25                                                     ` Don Zickus
  2006-08-13 20:06                                                     ` Andi Kleen
  0 siblings, 2 replies; 46+ messages in thread
From: Eric W. Biederman @ 2006-08-12  7:20 UTC (permalink / raw)
  To: Don Zickus
  Cc: vgoyal, fastboot, Horms, Jan Kratochvil, H. Peter Anvin,
	Magnus Damm, linux-kernel

Don Zickus <dzickus@redhat.com> writes:

>> >> 
>> >> I'm a little disappointed but at this point it isn't a great surprise,
>> >> the code is early yet and hasn't had much testing or attention.
>> >> I wonder if I have missed something else silly.
>> >> 
>> >> As for testing, can you use plain kexec to load the kernel at a
>> >> different address?  I'm curious to know if it is something related
>> >> to the kexec on panic path or if it is just running at a different
>> >> location that is the problem.
>> >
>
> I think I have found the 'something silly'.  Here is a patch that allows
> our Dell em64t boxes to boot.  This change matches the original code.  The
> main difference that caused the problems was the setting of _PAGE_NX bit.
> This caused issues in early_io_remap().  
>
> Thanks to Larry Woodman for debugging this.  

This looks like a different one but looks fairly sane.  

Do you know what code had problems having _PAGE_NX set.
What are we doing with early_ioremap the requires execute
permissions.  It doesn't sound right that we would need
this.

Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-12  7:20                                                   ` Eric W. Biederman
@ 2006-08-12 15:25                                                     ` Don Zickus
  2006-08-12 19:41                                                       ` Eric W. Biederman
  2006-08-13 20:06                                                     ` Andi Kleen
  1 sibling, 1 reply; 46+ messages in thread
From: Don Zickus @ 2006-08-12 15:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: vgoyal, fastboot, Horms, Jan Kratochvil, H. Peter Anvin,
	Magnus Damm, linux-kernel

On Sat, Aug 12, 2006 at 01:20:29AM -0600, Eric W. Biederman wrote:
> Don Zickus <dzickus@redhat.com> writes:
> 
> >> >> 
> >> >> I'm a little disappointed but at this point it isn't a great surprise,
> >> >> the code is early yet and hasn't had much testing or attention.
> >> >> I wonder if I have missed something else silly.
> >> >> 
> >> >> As for testing, can you use plain kexec to load the kernel at a
> >> >> different address?  I'm curious to know if it is something related
> >> >> to the kexec on panic path or if it is just running at a different
> >> >> location that is the problem.
> >> >
> >
> > I think I have found the 'something silly'.  Here is a patch that allows
> > our Dell em64t boxes to boot.  This change matches the original code.  The
> > main difference that caused the problems was the setting of _PAGE_NX bit.
> > This caused issues in early_io_remap().  
> >
> > Thanks to Larry Woodman for debugging this.  
> 
> This looks like a different one but looks fairly sane.  
> 
> Do you know what code had problems having _PAGE_NX set.
> What are we doing with early_ioremap the requires execute
> permissions.  It doesn't sound right that we would need
> this.

This fix is only needed for a subset of our em64t boxes, so it could be
just a chipset problem.  Supposedly, if I remember the conversation
correctly, when the kernel first boots it reserves about 40MB and about 20
pmds automatically.  After decompression, early_io_remap tries to setup
all the memory.  The conflict arose when early_io_remap tried to reuse one
of those pmds.  This caused the system to crash and reboot.  

I'll try to get more info Monday on the specifics.  

Cheers,
Don

> 
> Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-12 15:25                                                     ` Don Zickus
@ 2006-08-12 19:41                                                       ` Eric W. Biederman
  0 siblings, 0 replies; 46+ messages in thread
From: Eric W. Biederman @ 2006-08-12 19:41 UTC (permalink / raw)
  To: Don Zickus
  Cc: vgoyal, fastboot, Horms, Jan Kratochvil, H. Peter Anvin,
	Magnus Damm, linux-kernel

Don Zickus <dzickus@redhat.com> writes:

>> This looks like a different one but looks fairly sane.  
>> 
>> Do you know what code had problems having _PAGE_NX set.
>> What are we doing with early_ioremap the requires execute
>> permissions.  It doesn't sound right that we would need
>> this.
>
> This fix is only needed for a subset of our em64t boxes, so it could be
> just a chipset problem.  Supposedly, if I remember the conversation
> correctly, when the kernel first boots it reserves about 40MB and about 20
> pmds automatically.  After decompression, early_io_remap tries to setup
> all the memory.  The conflict arose when early_io_remap tried to reuse one
> of those pmds.  This caused the system to crash and reboot.  
>
> I'll try to get more info Monday on the specifics.  

Thanks.


Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-12  7:20                                                   ` Eric W. Biederman
  2006-08-12 15:25                                                     ` Don Zickus
@ 2006-08-13 20:06                                                     ` Andi Kleen
  2006-08-13 21:44                                                       ` Eric W. Biederman
  1 sibling, 1 reply; 46+ messages in thread
From: Andi Kleen @ 2006-08-13 20:06 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: fastboot, Jan Kratochvil, Horms, H. Peter Anvin, Magnus Damm,
	linux-kernel, dzickus

ebiederm@xmission.com (Eric W. Biederman) writes:
> 
> Do you know what code had problems having _PAGE_NX set.
> What are we doing with early_ioremap the requires execute
> permissions.  It doesn't sound right that we would need
> this.

The early EM64T CPUs didn't support NX and would GPF when
they hit the bit. That is why you always need to mask 
with __supported_pte_mask when using _PAGE_NX.

-Andi

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-13 20:06                                                     ` Andi Kleen
@ 2006-08-13 21:44                                                       ` Eric W. Biederman
  0 siblings, 0 replies; 46+ messages in thread
From: Eric W. Biederman @ 2006-08-13 21:44 UTC (permalink / raw)
  To: Andi Kleen
  Cc: fastboot, Jan Kratochvil, Horms, H. Peter Anvin, Magnus Damm,
	linux-kernel, dzickus

Andi Kleen <ak@suse.de> writes:

> ebiederm@xmission.com (Eric W. Biederman) writes:
>> 
>> Do you know what code had problems having _PAGE_NX set.
>> What are we doing with early_ioremap the requires execute
>> permissions.  It doesn't sound right that we would need
>> this.
>
> The early EM64T CPUs didn't support NX and would GPF when
> they hit the bit. That is why you always need to mask 
> with __supported_pte_mask when using _PAGE_NX.

Ok.  Thanks.  That explains that it.

The NX bit itself causes the GPF not someone trying to execute
data on a page.

Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-10 20:09                                               ` Eric W. Biederman
  2006-08-11 21:25                                                 ` Don Zickus
@ 2006-08-14 16:51                                                 ` Vivek Goyal
  2006-08-14 17:04                                                   ` H. Peter Anvin
  1 sibling, 1 reply; 46+ messages in thread
From: Vivek Goyal @ 2006-08-14 16:51 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Don Zickus, fastboot, Horms, Jan Kratochvil, H. Peter Anvin,
	Magnus Damm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2126 bytes --]

On Thu, Aug 10, 2006 at 02:09:58PM -0600, Eric W. Biederman wrote:
> > I just reserved memory at non 2MB aligned location 65MB@15MB so that
> > kernel is loaded at 16MB and other smaller segments below the compressed
> > image, then I can successfully booted into the kdump kernel.
> 
> :)
> 
> > So basically kexec on panic path seems to be clean except stomping issue.
> > May be bzImage program header should reflect right "MemSize" which
> > takes into account extra memory space calculations.
> 
> Yes.  That sounds like the right thing to do.  
> 
> I remember trying to compute a good memsize when I created the bzImage
> header but it is completely possible I missed some part of the
> calculation or assumed that the kernels .bss section would always be
> larger than what I needed for decompression.
>

Hi Eric,

Please find a patch attached to fix the issue. I have added few things
which might be consuming memory beyond "MemSize" as described in 
misc.c file.

Regarding decompressor code using kernel .bss section area, I think
that might not be possible as kernel .bss is part of raw binary
being generated. (vmlinux.bin). So effectively it becomes part of
input data and output compressed data (vmlinux.bin.gz).

I think generally objcopy does not output bss section in the raw
binary but in kernel case .bss is somewhere in the middle of the final
image and not at the end, and that could be the reason that objcopy
is oututting bss also in raw binary image.

In case of second objcopy while we are generating vmlinux.bin from 
compressed kernel vmlinux (vmlinux containing decompressor code), bss
section does not seem to be part of outputted raw binary. That's the
reason I had to pass another argument to tools/build.c to determine
exact memory requirements of compressed vmlinux.

So the decompressor can not use kernel's .bss for its execution. So
we should be taking decompressor's memory requirements into account
while calculating "MemSize", irrespective of kernel's .bss size? Am
I missing something?

If this seems reasonable, then i can roll out similar patch for i386
too.

Thanks & Regards
Vivek

[-- Attachment #2: x86_64-bzImage-mem-size-adjustment-fix.patch --]
[-- Type: text/plain, Size: 9903 bytes --]



o Kdump on x86_64 fails as at run time bzImage decompression is consuming
  more memory and stomps over some of the data loaded by kexec immediately
  after bzImage.

o How much memory bzImage will effectively consume at load time is exported
  through "MemSize" field of bzImage program headers.

o This patch does more adjustments to while calculating the load time
  memory requirements of bzImage, which gives loader a clue about
  where it is safe to load some other data.

o Following are some adjustments.

	- Add memory consumed by decompressor code. (code+data+bss...etc).
	- Adjust the meory required for safe decompression. (refer misc.c)
	- Take into account the HEAP memory used by decompressor code.


Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/boot/Makefile               |    3 
 arch/x86_64/boot/compressed/vmlinux.lds |    2 
 arch/x86_64/boot/tools/build.c          |  129 ++++++++++++++++++++------------
 3 files changed, 87 insertions(+), 47 deletions(-)

diff -puN arch/x86_64/boot/tools/build.c~x86_64-bzImage-mem-size-adjustment-fix arch/x86_64/boot/tools/build.c
--- linux-2.6.18-rc3-1M/arch/x86_64/boot/tools/build.c~x86_64-bzImage-mem-size-adjustment-fix	2006-08-10 20:05:10.000000000 -0400
+++ linux-2.6.18-rc3-1M-root/arch/x86_64/boot/tools/build.c	2006-08-11 01:45:59.000000000 -0400
@@ -54,8 +54,13 @@ int fd;
 int is_big_kernel;
 
 #define MAX_PHDRS 100
-static Elf64_Ehdr ehdr;
-static Elf64_Phdr phdr[MAX_PHDRS];
+/* Uncompressed kernel vmlinux. */
+static Elf64_Ehdr vmlinux_ehdr;
+static Elf64_Phdr vmlinux_phdr[MAX_PHDRS];
+
+/* Compressed kernel vmlinux (With decompressor code attached)*/
+static Elf64_Ehdr cvmlinux_ehdr;
+static Elf64_Phdr cvmlinux_phdr[MAX_PHDRS];
 
 void die(const char * str, ...)
 {
@@ -98,80 +103,80 @@ void file_open(const char *name)
 		die("Unable to open `%s': %m", name);
 }
 
-static void read_ehdr(void)
+static void read_ehdr(Elf64_Ehdr *ehdr)
 {
-	if (read(fd, &ehdr, sizeof(ehdr)) != sizeof(ehdr)) {
+	if (read(fd, ehdr, sizeof(*ehdr)) != sizeof(*ehdr)) {
 		die("Cannot read ELF header: %s\n",
 			strerror(errno));
 	}
-	if (memcmp(ehdr.e_ident, ELFMAG, 4) != 0) {
+	if (memcmp(ehdr->e_ident, ELFMAG, 4) != 0) {
 		die("No ELF magic\n");
 	}
-	if (ehdr.e_ident[EI_CLASS] != ELFCLASS64) {
+	if (ehdr->e_ident[EI_CLASS] != ELFCLASS64) {
 		die("Not a 64 bit executable\n");
 	}
-	if (ehdr.e_ident[EI_DATA] != ELFDATA2LSB) {
+	if (ehdr->e_ident[EI_DATA] != ELFDATA2LSB) {
 		die("Not a LSB ELF executable\n");
 	}
-	if (ehdr.e_ident[EI_VERSION] != EV_CURRENT) {
+	if (ehdr->e_ident[EI_VERSION] != EV_CURRENT) {
 		die("Unknown ELF version\n");
 	}
 	/* Convert the fields to native endian */
-	ehdr.e_type      = elf16_to_cpu(ehdr.e_type);
-	ehdr.e_machine   = elf16_to_cpu(ehdr.e_machine);
-	ehdr.e_version   = elf32_to_cpu(ehdr.e_version);
-	ehdr.e_entry     = elf64_to_cpu(ehdr.e_entry);
-	ehdr.e_phoff     = elf64_to_cpu(ehdr.e_phoff);
-	ehdr.e_shoff     = elf64_to_cpu(ehdr.e_shoff);
-	ehdr.e_flags     = elf32_to_cpu(ehdr.e_flags);
-	ehdr.e_ehsize    = elf16_to_cpu(ehdr.e_ehsize);
-	ehdr.e_phentsize = elf16_to_cpu(ehdr.e_phentsize);
-	ehdr.e_phnum     = elf16_to_cpu(ehdr.e_phnum);
-	ehdr.e_shentsize = elf16_to_cpu(ehdr.e_shentsize);
-	ehdr.e_shnum     = elf16_to_cpu(ehdr.e_shnum);
-	ehdr.e_shstrndx  = elf16_to_cpu(ehdr.e_shstrndx);
+	ehdr->e_type      = elf16_to_cpu(ehdr->e_type);
+	ehdr->e_machine   = elf16_to_cpu(ehdr->e_machine);
+	ehdr->e_version   = elf32_to_cpu(ehdr->e_version);
+	ehdr->e_entry     = elf64_to_cpu(ehdr->e_entry);
+	ehdr->e_phoff     = elf64_to_cpu(ehdr->e_phoff);
+	ehdr->e_shoff     = elf64_to_cpu(ehdr->e_shoff);
+	ehdr->e_flags     = elf32_to_cpu(ehdr->e_flags);
+	ehdr->e_ehsize    = elf16_to_cpu(ehdr->e_ehsize);
+	ehdr->e_phentsize = elf16_to_cpu(ehdr->e_phentsize);
+	ehdr->e_phnum     = elf16_to_cpu(ehdr->e_phnum);
+	ehdr->e_shentsize = elf16_to_cpu(ehdr->e_shentsize);
+	ehdr->e_shnum     = elf16_to_cpu(ehdr->e_shnum);
+	ehdr->e_shstrndx  = elf16_to_cpu(ehdr->e_shstrndx);
 
-	if ((ehdr.e_type != ET_EXEC) && (ehdr.e_type != ET_DYN)) {
+	if ((ehdr->e_type != ET_EXEC) && (ehdr->e_type != ET_DYN)) {
 		die("Unsupported ELF header type\n");
 	}
-	if (ehdr.e_machine != EM_X86_64) {
+	if (ehdr->e_machine != EM_X86_64) {
 		die("Not for x86_64\n");
 	}
-	if (ehdr.e_version != EV_CURRENT) {
+	if (ehdr->e_version != EV_CURRENT) {
 		die("Unknown ELF version\n");
 	}
-	if (ehdr.e_ehsize != sizeof(Elf64_Ehdr)) {
+	if (ehdr->e_ehsize != sizeof(Elf64_Ehdr)) {
 		die("Bad Elf header size\n");
 	}
-	if (ehdr.e_phentsize != sizeof(Elf64_Phdr)) {
+	if (ehdr->e_phentsize != sizeof(Elf64_Phdr)) {
 		die("Bad program header entry\n");
 	}
-	if (ehdr.e_shentsize != sizeof(Elf64_Shdr)) {
+	if (ehdr->e_shentsize != sizeof(Elf64_Shdr)) {
 		die("Bad section header entry\n");
 	}
-	if (ehdr.e_shstrndx >= ehdr.e_shnum) {
+	if (ehdr->e_shstrndx >= ehdr->e_shnum) {
 		die("String table index out of bounds\n");
 	}
 }
 
-static void read_phds(void)
+static void read_phdrs(Elf64_Ehdr *ehdr, Elf64_Phdr *phdr)
 {
 	int i;
 	size_t size;
-	if (ehdr.e_phnum > MAX_PHDRS) {
+	if (ehdr->e_phnum > MAX_PHDRS) {
 		die("%d program headers supported: %d\n",
-			ehdr.e_phnum, MAX_PHDRS);
+			ehdr->e_phnum, MAX_PHDRS);
 	}
-	if (lseek(fd, ehdr.e_phoff, SEEK_SET) < 0) {
+	if (lseek(fd, ehdr->e_phoff, SEEK_SET) < 0) {
 		die("Seek to %d failed: %s\n",
-			ehdr.e_phoff, strerror(errno));
+			ehdr->e_phoff, strerror(errno));
 	}
-	size = sizeof(phdr[0])*ehdr.e_phnum;
-	if (read(fd, &phdr, size) != size) {
-		die("Cannot read ELF section headers: %s\n",
+	size = (sizeof(*phdr))*(ehdr->e_phnum);
+	if (read(fd, phdr, size) != size) {
+		die("Cannot read ELF program headers: %s\n",
 			strerror(errno));
 	}
-	for(i = 0; i < ehdr.e_phnum; i++) {
+	for(i = 0; i < ehdr->e_phnum; i++) {
 		phdr[i].p_type      = elf32_to_cpu(phdr[i].p_type);
 		phdr[i].p_flags     = elf32_to_cpu(phdr[i].p_flags);
 		phdr[i].p_offset    = elf64_to_cpu(phdr[i].p_offset);
@@ -183,13 +188,13 @@ static void read_phds(void)
 	}
 }
 
-uint64_t vmlinux_memsz(void)
+uint64_t elf_exec_memsz(Elf64_Ehdr *ehdr, Elf64_Phdr *phdr)
 {
 	uint64_t min, max, size;
 	int i;
 	max = 0;
 	min = ~max;
-	for(i = 0; i < ehdr.e_phnum; i++) {
+	for(i = 0; i < ehdr->e_phnum; i++) {
 		uint64_t start, end;
 		if (phdr[i].p_type != PT_LOAD)
 			continue;
@@ -200,31 +205,32 @@ uint64_t vmlinux_memsz(void)
 		if (end > max)
 			max = end;
 	}
-	/* Get the reported size by vmlinux */
+	/* Get the reported size by elf exec */
 	size = max - min;
 	return size;
 }
 
 void usage(void)
 {
-	die("Usage: build [-b] bootsect setup system rootdev vmlinux [> image]");
+	die("Usage: build [-b] bootsect setup system rootdev vmlinux vmlinux.bin.gz <vmlinux with decompressor code>[> image]");
 }
 
 int main(int argc, char ** argv)
 {
 	unsigned int i, sz, setup_sectors;
 	uint64_t kernel_offset, kernel_filesz, kernel_memsz;
+	uint64_t vmlinux_memsz, cvmlinux_memsz, vmlinux_gz_size;
 	int c;
 	u32 sys_size;
 	byte major_root, minor_root;
-	struct stat sb;
+	struct stat sb, vmlinux_gz_sb;
 
 	if (argc > 2 && !strcmp(argv[1], "-b"))
 	  {
 	    is_big_kernel = 1;
 	    argc--, argv++;
 	  }
-	if (argc != 6)
+	if (argc != 8)
 		usage();
 	if (!strcmp(argv[4], "CURRENT")) {
 		if (stat("/", &sb)) {
@@ -307,11 +313,42 @@ int main(int argc, char ** argv)
 	}
 	close(fd);
 
+	/* Open uncompressed vmlinux. */
 	file_open(argv[5]);
-	read_ehdr();
-	read_phds();
+	read_ehdr(&vmlinux_ehdr);
+	read_phdrs(&vmlinux_ehdr, vmlinux_phdr);
 	close(fd);
-	kernel_memsz = vmlinux_memsz();
+	vmlinux_memsz = elf_exec_memsz(&vmlinux_ehdr, vmlinux_phdr);
+
+	/* Process vmlinux.bin.gz */
+	file_open(argv[6]);
+	if (fstat (fd, &vmlinux_gz_sb))
+		die("Unable to stat `%s': %m", argv[6]);
+	close(fd);
+	vmlinux_gz_size = vmlinux_gz_sb.st_size;
+
+	/* Process compressed vmlinux (compressed vmlinux + decompressor) */
+	file_open(argv[7]);
+	read_ehdr(&cvmlinux_ehdr);
+	read_phdrs(&cvmlinux_ehdr, cvmlinux_phdr);
+	close(fd);
+	cvmlinux_memsz = elf_exec_memsz(&cvmlinux_ehdr, cvmlinux_phdr);
+
+	kernel_memsz = vmlinux_memsz;
+
+	/* Add decompressor code size */
+	kernel_memsz += cvmlinux_memsz - vmlinux_gz_size;
+
+	/* Refer arch/x86_64/boot/compressed/misc.c for following adj.
+	 * Add 8 bytes for every 32K input block
+	 */
+	kernel_memsz += vmlinux_memsz >> 12;
+
+	/* Add 32K + 18 bytes of extra slack */
+	kernel_memsz = kernel_memsz + (32768 + 18);
+
+	/* Align on a 4K boundary. */
+	kernel_memsz = (kernel_memsz + 4095) & (~4095);
 
 	if (lseek(1,  88, SEEK_SET) != 88)		    /* Write sizes to the bootsector */
 		die("Output: seek failed");
diff -puN arch/x86_64/boot/Makefile~x86_64-bzImage-mem-size-adjustment-fix arch/x86_64/boot/Makefile
--- linux-2.6.18-rc3-1M/arch/x86_64/boot/Makefile~x86_64-bzImage-mem-size-adjustment-fix	2006-08-11 00:53:32.000000000 -0400
+++ linux-2.6.18-rc3-1M-root/arch/x86_64/boot/Makefile	2006-08-11 00:56:27.000000000 -0400
@@ -41,7 +41,8 @@ $(obj)/bzImage: BUILDFLAGS   := -b
 
 quiet_cmd_image = BUILD   $@
 cmd_image = $(obj)/tools/build $(BUILDFLAGS) $(obj)/bootsect $(obj)/setup \
-	    $(obj)/vmlinux.bin $(ROOT_DEV) vmlinux > $@
+	    $(obj)/vmlinux.bin $(ROOT_DEV) vmlinux \
+	    $(obj)/compressed/vmlinux.bin.gz $(obj)/compressed/vmlinux > $@
 
 $(obj)/bzImage: $(obj)/bootsect $(obj)/setup \
 			      $(obj)/vmlinux.bin $(obj)/tools/build FORCE
diff -puN arch/x86_64/boot/compressed/vmlinux.lds~x86_64-bzImage-mem-size-adjustment-fix arch/x86_64/boot/compressed/vmlinux.lds
--- linux-2.6.18-rc3-1M/arch/x86_64/boot/compressed/vmlinux.lds~x86_64-bzImage-mem-size-adjustment-fix	2006-08-11 01:29:52.000000000 -0400
+++ linux-2.6.18-rc3-1M-root/arch/x86_64/boot/compressed/vmlinux.lds	2006-08-11 01:32:00.000000000 -0400
@@ -40,5 +40,7 @@ SECTIONS
 		pgtable = . ;
 		. = . + 4096 * 6;
 		_heap = .;
+		. = . + 0x6000;		/* misc.c, Heap size. */
+		_heap_end = .;
 	}
 }
_

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-14 16:51                                                 ` [Fastboot] " Vivek Goyal
@ 2006-08-14 17:04                                                   ` H. Peter Anvin
  2006-08-14 18:11                                                     ` Vivek Goyal
  2006-08-14 20:00                                                     ` Eric W. Biederman
  0 siblings, 2 replies; 46+ messages in thread
From: H. Peter Anvin @ 2006-08-14 17:04 UTC (permalink / raw)
  To: vgoyal
  Cc: Eric W. Biederman, Don Zickus, fastboot, Horms, Jan Kratochvil,
	Magnus Damm, linux-kernel

Vivek Goyal wrote:
> On Thu, Aug 10, 2006 at 02:09:58PM -0600, Eric W. Biederman wrote:
>>> I just reserved memory at non 2MB aligned location 65MB@15MB so that
>>> kernel is loaded at 16MB and other smaller segments below the compressed
>>> image, then I can successfully booted into the kdump kernel.
>> :)
>>
>>> So basically kexec on panic path seems to be clean except stomping issue.
>>> May be bzImage program header should reflect right "MemSize" which
>>> takes into account extra memory space calculations.
>> Yes.  That sounds like the right thing to do.  
>>
>> I remember trying to compute a good memsize when I created the bzImage
>> header but it is completely possible I missed some part of the
>> calculation or assumed that the kernels .bss section would always be
>> larger than what I needed for decompression.
>>

Could someone please describe the intended semantics of this MemSize 
header, *and* its intended usage?

	-hpa

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-14 17:04                                                   ` H. Peter Anvin
@ 2006-08-14 18:11                                                     ` Vivek Goyal
  2006-08-14 19:32                                                       ` H. Peter Anvin
  2006-08-14 20:00                                                     ` Eric W. Biederman
  1 sibling, 1 reply; 46+ messages in thread
From: Vivek Goyal @ 2006-08-14 18:11 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Eric W. Biederman, Don Zickus, fastboot, Horms, Jan Kratochvil,
	Magnus Damm, linux-kernel

On Mon, Aug 14, 2006 at 10:04:29AM -0700, H. Peter Anvin wrote:
> Vivek Goyal wrote:
> >On Thu, Aug 10, 2006 at 02:09:58PM -0600, Eric W. Biederman wrote:
> >>>I just reserved memory at non 2MB aligned location 65MB@15MB so that
> >>>kernel is loaded at 16MB and other smaller segments below the compressed
> >>>image, then I can successfully booted into the kdump kernel.
> >>:)
> >>
> >>>So basically kexec on panic path seems to be clean except stomping issue.
> >>>May be bzImage program header should reflect right "MemSize" which
> >>>takes into account extra memory space calculations.
> >>Yes.  That sounds like the right thing to do.  
> >>
> >>I remember trying to compute a good memsize when I created the bzImage
> >>header but it is completely possible I missed some part of the
> >>calculation or assumed that the kernels .bss section would always be
> >>larger than what I needed for decompression.
> >>
> 
> Could someone please describe the intended semantics of this MemSize 
> header, *and* its intended usage?
>

Now and ELF header(attached to bzImage) is being used to describe
the kernel executable. One program header of PT_LOAD type is being
created. The "p_filesz" field of program header is basically 
describing the vmlinux file size and "p_memsz" is giving how
much memory will be consumed by kernel image at load time.

Ideally "p_memsz" should be "p_memsz" summation of all the program
headers of vmlinux file but I guess in this case we are stretching the
ELF specification a little bit and also taking into the account the
additional memory which will be used by decompressor and decompression
logic by the time execution is transferred to the actual kernel.

The intended usage is currently kexec/kdump. While pre-loading a 
kernel in memory, kexec creates multiple segments and puts various
data into it. (like kernel image, initrd, parameters etc.) Kexec
needs to know how much memory is being used by the loaded kernel so 
that it can place another segment after kernel at a safe distance.
By reading "p_memsz" from ELF header, kexec can determine it.

Currently problem we are facing in kdump case is that parameter
segment (command line and other bootloader parameters) is being
placed immediately after kernel which gets stomped over by decompressor
code and kernel boot fails.

Normal boot never faces this problem as parameter segment is always
loaded below where kernel image is loaded.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-14 18:11                                                     ` Vivek Goyal
@ 2006-08-14 19:32                                                       ` H. Peter Anvin
  2006-08-14 19:42                                                         ` Vivek Goyal
  0 siblings, 1 reply; 46+ messages in thread
From: H. Peter Anvin @ 2006-08-14 19:32 UTC (permalink / raw)
  To: vgoyal
  Cc: Eric W. Biederman, Don Zickus, fastboot, Horms, Jan Kratochvil,
	Magnus Damm, linux-kernel

Vivek Goyal wrote:
> On Mon, Aug 14, 2006 at 10:04:29AM -0700, H. Peter Anvin wrote:
>> Vivek Goyal wrote:
>>> On Thu, Aug 10, 2006 at 02:09:58PM -0600, Eric W. Biederman wrote:
>>>>> I just reserved memory at non 2MB aligned location 65MB@15MB so that
>>>>> kernel is loaded at 16MB and other smaller segments below the compressed
>>>>> image, then I can successfully booted into the kdump kernel.
>>>> :)
>>>>
>>>>> So basically kexec on panic path seems to be clean except stomping issue.
>>>>> May be bzImage program header should reflect right "MemSize" which
>>>>> takes into account extra memory space calculations.
>>>> Yes.  That sounds like the right thing to do.  
>>>>
>>>> I remember trying to compute a good memsize when I created the bzImage
>>>> header but it is completely possible I missed some part of the
>>>> calculation or assumed that the kernels .bss section would always be
>>>> larger than what I needed for decompression.
>>>>
>> Could someone please describe the intended semantics of this MemSize 
>> header, *and* its intended usage?
>>
> 
> Now and ELF header(attached to bzImage) is being used to describe
> the kernel executable. One program header of PT_LOAD type is being
> created. The "p_filesz" field of program header is basically 
> describing the vmlinux file size and "p_memsz" is giving how
> much memory will be consumed by kernel image at load time.
> 
> Ideally "p_memsz" should be "p_memsz" summation of all the program
> headers of vmlinux file but I guess in this case we are stretching the
> ELF specification a little bit and also taking into the account the
> additional memory which will be used by decompressor and decompression
> logic by the time execution is transferred to the actual kernel.
> 

What about once the kernel is booted?

	-hpa

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-14 19:32                                                       ` H. Peter Anvin
@ 2006-08-14 19:42                                                         ` Vivek Goyal
  2006-08-14 19:45                                                           ` H. Peter Anvin
  0 siblings, 1 reply; 46+ messages in thread
From: Vivek Goyal @ 2006-08-14 19:42 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Eric W. Biederman, Don Zickus, fastboot, Horms, Jan Kratochvil,
	Magnus Damm, linux-kernel

On Mon, Aug 14, 2006 at 12:32:32PM -0700, H. Peter Anvin wrote:
> Vivek Goyal wrote:
> >On Mon, Aug 14, 2006 at 10:04:29AM -0700, H. Peter Anvin wrote:
> >>Vivek Goyal wrote:
> >>>On Thu, Aug 10, 2006 at 02:09:58PM -0600, Eric W. Biederman wrote:
> >>>>>I just reserved memory at non 2MB aligned location 65MB@15MB so that
> >>>>>kernel is loaded at 16MB and other smaller segments below the 
> >>>>>compressed
> >>>>>image, then I can successfully booted into the kdump kernel.
> >>>>:)
> >>>>
> >>>>>So basically kexec on panic path seems to be clean except stomping 
> >>>>>issue.
> >>>>>May be bzImage program header should reflect right "MemSize" which
> >>>>>takes into account extra memory space calculations.
> >>>>Yes.  That sounds like the right thing to do.  
> >>>>
> >>>>I remember trying to compute a good memsize when I created the bzImage
> >>>>header but it is completely possible I missed some part of the
> >>>>calculation or assumed that the kernels .bss section would always be
> >>>>larger than what I needed for decompression.
> >>>>
> >>Could someone please describe the intended semantics of this MemSize 
> >>header, *and* its intended usage?
> >>
> >
> >Now and ELF header(attached to bzImage) is being used to describe
> >the kernel executable. One program header of PT_LOAD type is being
> >created. The "p_filesz" field of program header is basically 
> >describing the vmlinux file size and "p_memsz" is giving how
> >much memory will be consumed by kernel image at load time.
> >
> >Ideally "p_memsz" should be "p_memsz" summation of all the program
> >headers of vmlinux file but I guess in this case we are stretching the
> >ELF specification a little bit and also taking into the account the
> >additional memory which will be used by decompressor and decompression
> >logic by the time execution is transferred to the actual kernel.
> >
> 
> What about once the kernel is booted?
> 

Sorry did not understand the question. Few more lines will help.

Thanks
Vivek 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-14 19:42                                                         ` Vivek Goyal
@ 2006-08-14 19:45                                                           ` H. Peter Anvin
  2006-08-14 19:57                                                             ` Vivek Goyal
  2006-08-14 20:10                                                             ` Eric W. Biederman
  0 siblings, 2 replies; 46+ messages in thread
From: H. Peter Anvin @ 2006-08-14 19:45 UTC (permalink / raw)
  To: vgoyal
  Cc: Eric W. Biederman, Don Zickus, fastboot, Horms, Jan Kratochvil,
	Magnus Damm, linux-kernel

Vivek Goyal wrote:
>>>
>> What about once the kernel is booted?
> 
> Sorry did not understand the question. Few more lines will help.
> 

Is this field intended to protect any kind of memory during the early 
boot phase of the kernel proper, or only the decompressor?

	-hpa


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-14 19:45                                                           ` H. Peter Anvin
@ 2006-08-14 19:57                                                             ` Vivek Goyal
  2006-08-14 20:10                                                             ` Eric W. Biederman
  1 sibling, 0 replies; 46+ messages in thread
From: Vivek Goyal @ 2006-08-14 19:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Eric W. Biederman, Don Zickus, fastboot, Horms, Jan Kratochvil,
	Magnus Damm, linux-kernel

On Mon, Aug 14, 2006 at 12:45:31PM -0700, H. Peter Anvin wrote:
> Vivek Goyal wrote:
> >>>
> >>What about once the kernel is booted?
> >
> >Sorry did not understand the question. Few more lines will help.
> >
> 
> Is this field intended to protect any kind of memory during the early 
> boot phase of the kernel proper, or only the decompressor?
>

I think it should protect against any dynamic memory usage during early
boot phase too till we reach a point where kernel is aware of BIOS provided
memory maps and kernel memory area usage can be controlled with the help
of BIOS provided/User defined memory maps.

In i386 implementation Eric is alredy taking into account the memory
used by bootmem bitmap and initial page tables. I have not looked into
x86_64 kernel code whether do I need to make such adjustments. It worked
for me so did not bother much. I will look into it.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-14 17:04                                                   ` H. Peter Anvin
  2006-08-14 18:11                                                     ` Vivek Goyal
@ 2006-08-14 20:00                                                     ` Eric W. Biederman
  1 sibling, 0 replies; 46+ messages in thread
From: Eric W. Biederman @ 2006-08-14 20:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: vgoyal, Don Zickus, fastboot, Horms, Jan Kratochvil, Magnus Damm,
	linux-kernel

"H. Peter Anvin" <hpa@zytor.com> writes:

> Vivek Goyal wrote:
>> On Thu, Aug 10, 2006 at 02:09:58PM -0600, Eric W. Biederman wrote:
>>>> I just reserved memory at non 2MB aligned location 65MB@15MB so that
>>>> kernel is loaded at 16MB and other smaller segments below the compressed
>>>> image, then I can successfully booted into the kdump kernel.
>>> :)
>>>
>>>> So basically kexec on panic path seems to be clean except stomping issue.
>>>> May be bzImage program header should reflect right "MemSize" which
>>>> takes into account extra memory space calculations.
>>> Yes.  That sounds like the right thing to do.
>>>
>>> I remember trying to compute a good memsize when I created the bzImage
>>> header but it is completely possible I missed some part of the
>>> calculation or assumed that the kernels .bss section would always be
>>> larger than what I needed for decompression.
>>>
>
> Could someone please describe the intended semantics of this MemSize header,
> *and* its intended usage?

I think Vivek did a decent job.  But here is my take.

Currently the ELF header we prepend to the linux kernel have
exactly one segment.

A segment has several file offset, fields alignment, type, physical
address, virtual address, file size, and memory size.

The file size parameter describes how much data to pull off of the
disk.  The memory size describes how much room the segment will
consume in memory.  The difference between file size and memory size
is treated as bss data.  Memory size must always be bigger than
file size.

In the case of the kernel there is a certain amount of memory that
the kernel uses before it starts reserving things and using the
memory map.  The memory that the kernel unconditionally uses should
be described with the memsize parameter.

An accurate description allows your initrd and your parameter segment
to be placed right up next to your kernel without worry about them
being stomped, we already do this on a couple of other architectures,
or it allows you to detect that there is not enough room to hold your
kernel, initrd and parameters.

So since we now have the possibility of describing this accurately I
would like to.  Although the traditional x86 work around of pushing
everything up as far in memory as we can and the kernel can address
is potentially still an option.

For the kexec on panic case we have a very small reserved chunk of
memory (16MB I think is typical right now).  The smaller that we can
successfully run out of the better.  Which makes it easy to hit these
kinds of things if we don't have an accurate description of the
kernel.

Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-14 19:45                                                           ` H. Peter Anvin
  2006-08-14 19:57                                                             ` Vivek Goyal
@ 2006-08-14 20:10                                                             ` Eric W. Biederman
  2006-08-14 20:59                                                               ` Vivek Goyal
  1 sibling, 1 reply; 46+ messages in thread
From: Eric W. Biederman @ 2006-08-14 20:10 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: vgoyal, Don Zickus, fastboot, Horms, Jan Kratochvil, Magnus Damm,
	linux-kernel

"H. Peter Anvin" <hpa@zytor.com> writes:

> Vivek Goyal wrote:
>>>>
>>> What about once the kernel is booted?
>> Sorry did not understand the question. Few more lines will help.
>>
>
> Is this field intended to protect any kind of memory during the early boot phase
> of the kernel proper, or only the decompressor?

Yes, the field should account for memory usage until the kernel starts
doing the accounting at run time.

I'm actually surprised that taking into account the .bss was not enough to
cover up anything the decompressor was doing.  Usually the kernel's .bss
is more than the extra 32K or so that the decompressor uses.

Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-14 20:10                                                             ` Eric W. Biederman
@ 2006-08-14 20:59                                                               ` Vivek Goyal
  2006-08-14 21:15                                                                 ` Eric W. Biederman
  0 siblings, 1 reply; 46+ messages in thread
From: Vivek Goyal @ 2006-08-14 20:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: H. Peter Anvin, Don Zickus, fastboot, Horms, Jan Kratochvil,
	Magnus Damm, linux-kernel

On Mon, Aug 14, 2006 at 02:10:51PM -0600, Eric W. Biederman wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
> 
> > Vivek Goyal wrote:
> >>>>
> >>> What about once the kernel is booted?
> >> Sorry did not understand the question. Few more lines will help.
> >>
> >
> > Is this field intended to protect any kind of memory during the early boot phase
> > of the kernel proper, or only the decompressor?
> 
> Yes, the field should account for memory usage until the kernel starts
> doing the accounting at run time.
> 
> I'm actually surprised that taking into account the .bss was not enough to
> cover up anything the decompressor was doing.  Usually the kernel's .bss
> is more than the extra 32K or so that the decompressor uses.
>

I think .bss section size will act as a buffer for decompressor only if
.bss is not part of compressed data hence decompressor does not have to
move beyond bss and it can run very well from kernel bss space. 

But somehow on my machine, it looks like that bss is very much part
of raw binary image hence part of compressed data (vmlinux.bin.gz).
memsz exported in bzImage is same as size of raw output binary.

Probably that's the reason that we are stomping other segments in my
case and if my understanding is right then it should happen irrespective
of kernel bss size.

Here I am pasting how kernel vmlinux file program headers look like.
.bss is mapped by first program header along with .text.

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000200000 0xffffffff80000000 0x0000000000000000
                 0x0000000000546bf8 0x00000000005dbc28  RWE    200000
  LOAD           0x00000000007dc000 0xffffffff805dc000 0x00000000005dc000
                 0x000000000000ede0 0x000000000000ede0  RW     200000
  LOAD           0x0000000000800000 0xffffffffff600000 0x00000000005eb000
                 0x0000000000000c08 0x0000000000000c08  RWE    200000
  LOAD           0x00000000009ec000 0xffffffff805ec000 0x00000000005ec000
                 0x0000000000044004 0x0000000000044004  RWE    200000
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RWE    8

 Section to Segment mapping:
  Segment Sections...
   00     .text __ex_table .rodata .pci_fixup __ksymtab __ksymtab_gpl
__ksymtab_unused __ksymtab_gpl_future __ksymtab_strings __param
.eh_frame .data .bss
   01     .data.cacheline_aligned .data.read_mostly
   02     .vsyscall_0 .xtime_lock .vxtime .wall_jiffies .sys_tz
.sysctl_vsyscall .xtime .jiffies .vsyscall_1 .vsyscall_2 .vsyscall_3
   03     .data.init_task .data.page_aligned .smp_altinstructions
.smp_locks .smp_altinstr_replacement .init.text .init.data .init.setup
.initcall.init .con_initcall.init .altinstructions .altinstr_replacement
.exit.text .init.ramfs .data.percpu .data_nosave
   04
 
Thanks
Vivek
 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-08-14 20:59                                                               ` Vivek Goyal
@ 2006-08-14 21:15                                                                 ` Eric W. Biederman
  0 siblings, 0 replies; 46+ messages in thread
From: Eric W. Biederman @ 2006-08-14 21:15 UTC (permalink / raw)
  To: vgoyal
  Cc: H. Peter Anvin, Don Zickus, fastboot, Horms, Jan Kratochvil,
	Magnus Damm, linux-kernel

Vivek Goyal <vgoyal@in.ibm.com> writes:

> On Mon, Aug 14, 2006 at 02:10:51PM -0600, Eric W. Biederman wrote:
>> "H. Peter Anvin" <hpa@zytor.com> writes:
>> 
>> > Vivek Goyal wrote:
>> >>>>
>> >>> What about once the kernel is booted?
>> >> Sorry did not understand the question. Few more lines will help.
>> >>
>> >
>> > Is this field intended to protect any kind of memory during the early boot
> phase
>> > of the kernel proper, or only the decompressor?
>> 
>> Yes, the field should account for memory usage until the kernel starts
>> doing the accounting at run time.
>> 
>> I'm actually surprised that taking into account the .bss was not enough to
>> cover up anything the decompressor was doing.  Usually the kernel's .bss
>> is more than the extra 32K or so that the decompressor uses.
>>
>
> I think .bss section size will act as a buffer for decompressor only if
> .bss is not part of compressed data hence decompressor does not have to
> move beyond bss and it can run very well from kernel bss space. 

Agreed.  

> But somehow on my machine, it looks like that bss is very much part
> of raw binary image hence part of compressed data (vmlinux.bin.gz).
> memsz exported in bzImage is same as size of raw output binary.
>
> Probably that's the reason that we are stomping other segments in my
> case and if my understanding is right then it should happen irrespective
> of kernel bss size.
>
> Here I am pasting how kernel vmlinux file program headers look like.
> .bss is mapped by first program header along with .text.

Ok.  So somehow we have done the insane thing of putting .bss in the middle of
the executable.  It might even be sane if it is just the .init sections we put
after it, but no we are putting .data after the .bss.

Well that easily explains why we had a problem.

Getting the proper accounting in for handling this case is probably reasonable.
It probably also makes sense for someone to take a good hard look at the crazy
ordering of sections on x86_64.

Eric

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [CFT] ELF Relocatable x86 and x86_64 bzImages
  2006-07-31 16:19                   ` [CFT] ELF Relocatable x86 and x86_64 bzImages Eric W. Biederman
@ 2006-08-25 20:16                       ` Vivek Goyal
  2006-08-04 21:08                     ` Don Zickus
  2006-08-25 20:16                       ` Vivek Goyal
  2 siblings, 0 replies; 46+ messages in thread
From: Vivek Goyal @ 2006-08-25 20:16 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: fastboot, Jan Kratochvil, Magnus Damm, Horms, Linda Wang,
	linux-kernel, H. Peter Anvin, linuxppc64-dev

On Mon, Jul 31, 2006 at 10:19:04AM -0600, Eric W. Biederman wrote:
> 
> I have spent some time and have gotten my relocatable kernel patches
> working against the latest kernels.  I intend to push this upstream
> shortly.
> 
> Could all of the people who care take a look and test this out
> to make certain that it doesn't just work on my test box?
> 
> My approach is to extend bzImage so that it is an ET_DYN ELF executable
> (we have what used to be a bootsector where we can put the header).
> Boot loaders are explicitly not expected to process relocations.
> 
> The x86_64 kernel is simply built to live at a fixed virtual address
> and the boot page tables are relocated.  The i386 kernel is built
> to process relocates generated with --embedded-relocs (after vmlinux.lds.S)
> has been fixed up to sort out static and dynamic relocations.
> 
> Currently there are 33 patches in my tree to do this.
> 
> The weirdest symptom I have had so far is that page faults did not
> trigger the early exception handler on x86_64 (instead I got a reboot).
> 
> The code should be available shortly at:
> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-reloc.git#reloc-v2.6.18-rc3
> 
> If all goes well with the testing I will push the patches to Andrew in the next couple 
> of days.

It breaks powerpc build as poewrpc does not seem to be defining symbol
_text which is used by arch independent kallsyms.c. Attached is the one
line fix.

Thanks
Vivek


o ppc64 does not seem to be defining symbol _text  which is used by
  kernel/kallsyms.c for relocatable kernel patches. Instead of absolute
  symbol addresses now it is stored as offset from symbol _text
  (_text + offset) so that relocations entries for this section are
  generated, if need be. (currently i386 will be the only user once
  the relocatable kernel patches are merged).

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/powerpc/kernel/vmlinux.lds.S |    1 +
 1 file changed, 1 insertion(+)

diff -puN arch/powerpc/kernel/vmlinux.lds.S~ppc64-compilation-fix arch/powerpc/kernel/vmlinux.lds.S
--- linux-2.6.18-rc3-1M/arch/powerpc/kernel/vmlinux.lds.S~ppc64-compilation-fix	2006-08-24 16:16:17.000000000 -0400
+++ linux-2.6.18-rc3-1M-root/arch/powerpc/kernel/vmlinux.lds.S	2006-08-24 16:26:33.000000000 -0400
@@ -33,6 +33,7 @@ SECTIONS
 
 	/* Text and gots */
 	.text : {
+		_text = .;
 		*(.text .text.*)
 		SCHED_TEXT
 		LOCK_TEXT
_

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [CFT] ELF Relocatable x86 and x86_64 bzImages
@ 2006-08-25 20:16                       ` Vivek Goyal
  0 siblings, 0 replies; 46+ messages in thread
From: Vivek Goyal @ 2006-08-25 20:16 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: fastboot, Jan Kratochvil, linux-kernel, Linda Wang, Horms,
	H. Peter Anvin, linuxppc64-dev

On Mon, Jul 31, 2006 at 10:19:04AM -0600, Eric W. Biederman wrote:
> 
> I have spent some time and have gotten my relocatable kernel patches
> working against the latest kernels.  I intend to push this upstream
> shortly.
> 
> Could all of the people who care take a look and test this out
> to make certain that it doesn't just work on my test box?
> 
> My approach is to extend bzImage so that it is an ET_DYN ELF executable
> (we have what used to be a bootsector where we can put the header).
> Boot loaders are explicitly not expected to process relocations.
> 
> The x86_64 kernel is simply built to live at a fixed virtual address
> and the boot page tables are relocated.  The i386 kernel is built
> to process relocates generated with --embedded-relocs (after vmlinux.lds.S)
> has been fixed up to sort out static and dynamic relocations.
> 
> Currently there are 33 patches in my tree to do this.
> 
> The weirdest symptom I have had so far is that page faults did not
> trigger the early exception handler on x86_64 (instead I got a reboot).
> 
> The code should be available shortly at:
> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-reloc.git#reloc-v2.6.18-rc3
> 
> If all goes well with the testing I will push the patches to Andrew in the next couple 
> of days.

It breaks powerpc build as poewrpc does not seem to be defining symbol
_text which is used by arch independent kallsyms.c. Attached is the one
line fix.

Thanks
Vivek


o ppc64 does not seem to be defining symbol _text  which is used by
  kernel/kallsyms.c for relocatable kernel patches. Instead of absolute
  symbol addresses now it is stored as offset from symbol _text
  (_text + offset) so that relocations entries for this section are
  generated, if need be. (currently i386 will be the only user once
  the relocatable kernel patches are merged).

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/powerpc/kernel/vmlinux.lds.S |    1 +
 1 file changed, 1 insertion(+)

diff -puN arch/powerpc/kernel/vmlinux.lds.S~ppc64-compilation-fix arch/powerpc/kernel/vmlinux.lds.S
--- linux-2.6.18-rc3-1M/arch/powerpc/kernel/vmlinux.lds.S~ppc64-compilation-fix	2006-08-24 16:16:17.000000000 -0400
+++ linux-2.6.18-rc3-1M-root/arch/powerpc/kernel/vmlinux.lds.S	2006-08-24 16:26:33.000000000 -0400
@@ -33,6 +33,7 @@ SECTIONS
 
 	/* Text and gots */
 	.text : {
+		_text = .;
 		*(.text .text.*)
 		SCHED_TEXT
 		LOCK_TEXT
_

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2006-08-25 20:17 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <aec7e5c30606300145p441d8d0xd89fab5e87de5a22@mail.gmail.com>
     [not found] ` <20060705222448.GC992@in.ibm.com>
     [not found]   ` <aec7e5c30607051932r49bbcc7eh2c190daa06859dcc@mail.gmail.com>
     [not found]     ` <20060706081520.GB28225@host0.dyn.jankratochvil.net>
     [not found]       ` <aec7e5c30607070147g657d2624qa93a145dd4515484@mail.gmail.com>
     [not found]         ` <20060707133518.GA15810@in.ibm.com>
     [not found]           ` <20060707143519.GB13097@host0.dyn.jankratochvil.net>
     [not found]             ` <20060710233219.GF16215@in.ibm.com>
     [not found]               ` <20060711010815.GB1021@host0.dyn.jankratochvil.net>
     [not found]                 ` <m1d5c92yv4.fsf@ebiederm.dsl.xmission.com>
2006-07-31 16:19                   ` [CFT] ELF Relocatable x86 and x86_64 bzImages Eric W. Biederman
2006-07-31 20:25                     ` Vivek Goyal
2006-07-31 21:00                       ` [Fastboot] " Vivek Goyal
2006-08-01  2:31                         ` Eric W. Biederman
2006-08-01  2:34                           ` H. Peter Anvin
2006-08-01  3:44                             ` Eric W. Biederman
2006-08-01  4:25                           ` Jan Kratochvil
2006-08-01  9:09                             ` Eric W. Biederman
2006-08-01  9:43                               ` Jan Kratochvil
2006-08-01 11:28                                 ` Eric W. Biederman
2006-08-04 21:08                     ` Don Zickus
2006-08-04 21:25                       ` Eric W. Biederman
2006-08-04 23:43                         ` Don Zickus
2006-08-05  7:49                           ` Eric W. Biederman
2006-08-05 16:07                           ` Eric W. Biederman
2006-08-07 17:44                             ` Don Zickus
2006-08-07 18:08                               ` Eric W. Biederman
2006-08-07 23:57                                 ` Don Zickus
2006-08-08  5:01                                   ` Eric W. Biederman
2006-08-08 19:36                                     ` Don Zickus
2006-08-09 20:06                                     ` Don Zickus
2006-08-10  6:09                                       ` Eric W. Biederman
2006-08-10 13:13                                         ` Vivek Goyal
2006-08-10 17:05                                           ` Eric W. Biederman
2006-08-10 18:18                                             ` Vivek Goyal
2006-08-10 20:09                                               ` Eric W. Biederman
2006-08-11 21:25                                                 ` Don Zickus
2006-08-12  7:20                                                   ` Eric W. Biederman
2006-08-12 15:25                                                     ` Don Zickus
2006-08-12 19:41                                                       ` Eric W. Biederman
2006-08-13 20:06                                                     ` Andi Kleen
2006-08-13 21:44                                                       ` Eric W. Biederman
2006-08-14 16:51                                                 ` [Fastboot] " Vivek Goyal
2006-08-14 17:04                                                   ` H. Peter Anvin
2006-08-14 18:11                                                     ` Vivek Goyal
2006-08-14 19:32                                                       ` H. Peter Anvin
2006-08-14 19:42                                                         ` Vivek Goyal
2006-08-14 19:45                                                           ` H. Peter Anvin
2006-08-14 19:57                                                             ` Vivek Goyal
2006-08-14 20:10                                                             ` Eric W. Biederman
2006-08-14 20:59                                                               ` Vivek Goyal
2006-08-14 21:15                                                                 ` Eric W. Biederman
2006-08-14 20:00                                                     ` Eric W. Biederman
2006-08-08 23:36                                   ` Andi Kleen
2006-08-25 20:16                     ` Vivek Goyal
2006-08-25 20:16                       ` Vivek Goyal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.