* [CFT] ELF Relocatable x86 and x86_64 bzImages [not found] ` <m1d5c92yv4.fsf@ebiederm.dsl.xmission.com> @ 2006-07-31 16:19 ` Eric W. Biederman 2006-07-31 20:25 ` Vivek Goyal ` (2 more replies) 0 siblings, 3 replies; 47+ messages in thread From: Eric W. Biederman @ 2006-07-31 16:19 UTC (permalink / raw) To: fastboot Cc: Jan Kratochvil, Magnus Damm, Horms, Vivek Goyal, Linda Wang, linux-kernel, H. Peter Anvin I have spent some time and have gotten my relocatable kernel patches working against the latest kernels. I intend to push this upstream shortly. Could all of the people who care take a look and test this out to make certain that it doesn't just work on my test box? My approach is to extend bzImage so that it is an ET_DYN ELF executable (we have what used to be a bootsector where we can put the header). Boot loaders are explicitly not expected to process relocations. The x86_64 kernel is simply built to live at a fixed virtual address and the boot page tables are relocated. The i386 kernel is built to process relocates generated with --embedded-relocs (after vmlinux.lds.S) has been fixed up to sort out static and dynamic relocations. Currently there are 33 patches in my tree to do this. The weirdest symptom I have had so far is that page faults did not trigger the early exception handler on x86_64 (instead I got a reboot). The code should be available shortly at: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-reloc.git#reloc-v2.6.18-rc3 If all goes well with the testing I will push the patches to Andrew in the next couple of days. Eric ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-07-31 16:19 ` [CFT] ELF Relocatable x86 and x86_64 bzImages Eric W. Biederman @ 2006-07-31 20:25 ` Vivek Goyal 2006-07-31 21:00 ` [Fastboot] " Vivek Goyal 2006-08-04 21:08 ` Don Zickus 2006-08-25 20:16 ` Vivek Goyal 2 siblings, 1 reply; 47+ messages in thread From: Vivek Goyal @ 2006-07-31 20:25 UTC (permalink / raw) To: Eric W. Biederman Cc: fastboot, Jan Kratochvil, Magnus Damm, Horms, Linda Wang, linux-kernel, H. Peter Anvin On Mon, Jul 31, 2006 at 10:19:04AM -0600, Eric W. Biederman wrote: > > I have spent some time and have gotten my relocatable kernel patches > working against the latest kernels. I intend to push this upstream > shortly. > > Could all of the people who care take a look and test this out > to make certain that it doesn't just work on my test box? > Hi Eric, Currently I am testing your patches on i386. With CONFIG_RELOCATABLE=y kernel boots fine and kexec also works. But my kernel hangs on kexec on panic case. It hangs early in decompress_kernel(). Kernel hangs at following condition. + if (((u32)output - CONFIG_PHYSICAL_START) & 0x3fffff) + error("Destination address not 4M aligned"); I have reserved 64MB at 16M and kernel is loaded at 16M. I had expected that I would get "Destination address not 4M aligned" on serial console but did not happen. Had to put outb() to get to this point. Will look more into it. Thanks Vivek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-07-31 20:25 ` Vivek Goyal @ 2006-07-31 21:00 ` Vivek Goyal 2006-08-01 2:31 ` Eric W. Biederman 0 siblings, 1 reply; 47+ messages in thread From: Vivek Goyal @ 2006-07-31 21:00 UTC (permalink / raw) To: Eric W. Biederman Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel On Mon, Jul 31, 2006 at 04:25:20PM -0400, Vivek Goyal wrote: > On Mon, Jul 31, 2006 at 10:19:04AM -0600, Eric W. Biederman wrote: > > > > I have spent some time and have gotten my relocatable kernel patches > > working against the latest kernels. I intend to push this upstream > > shortly. > > > > Could all of the people who care take a look and test this out > > to make certain that it doesn't just work on my test box? > > > Hi Eric, > > Currently I am testing your patches on i386. With CONFIG_RELOCATABLE=y > kernel boots fine and kexec also works. > > But my kernel hangs on kexec on panic case. It hangs early in > decompress_kernel(). Kernel hangs at following condition. > > + if (((u32)output - CONFIG_PHYSICAL_START) & 0x3fffff) > + error("Destination address not 4M aligned"); > Ok. I am decompressing the kernel to 16MB and after reducing 1MB of CONFIG_PHYSICAL_START I am left with 15MB which is not 4M aligned hence I seems to be running into it. I changed it to if ((u32)output) & 0x3fffff) and kdump kernel booted fine. But this will run into issues if I load kernel at 1MB. I got a dump question. Why do I have to load the kernel at 4MB alignment? Existing kernel boots loads at 1MB, which is non 4MB aligned and it works fine? Thanks Vivek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-07-31 21:00 ` [Fastboot] " Vivek Goyal @ 2006-08-01 2:31 ` Eric W. Biederman 2006-08-01 2:34 ` H. Peter Anvin 2006-08-01 4:25 ` Jan Kratochvil 0 siblings, 2 replies; 47+ messages in thread From: Eric W. Biederman @ 2006-08-01 2:31 UTC (permalink / raw) To: vgoyal Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel Vivek Goyal <vgoyal@in.ibm.com> writes: > On Mon, Jul 31, 2006 at 04:25:20PM -0400, Vivek Goyal wrote: >> On Mon, Jul 31, 2006 at 10:19:04AM -0600, Eric W. Biederman wrote: >> > >> > I have spent some time and have gotten my relocatable kernel patches >> > working against the latest kernels. I intend to push this upstream >> > shortly. >> > >> > Could all of the people who care take a look and test this out >> > to make certain that it doesn't just work on my test box? >> > >> Hi Eric, >> >> Currently I am testing your patches on i386. With CONFIG_RELOCATABLE=y >> kernel boots fine and kexec also works. >> >> But my kernel hangs on kexec on panic case. It hangs early in >> decompress_kernel(). Kernel hangs at following condition. >> >> + if (((u32)output - CONFIG_PHYSICAL_START) & 0x3fffff) >> + error("Destination address not 4M aligned"); >> As for the missing print. Did you have an appropriate earlyprintk? > Ok. I am decompressing the kernel to 16MB and after reducing 1MB of > CONFIG_PHYSICAL_START I am left with 15MB which is not 4M aligned > hence I seems to be running into it. > > I changed it to > > if ((u32)output) & 0x3fffff) > > and kdump kernel booted fine. But this will run into issues if I load > kernel at 1MB. > > I got a dump question. Why do I have to load the kernel at 4MB alignment? > Existing kernel boots loads at 1MB, which is non 4MB aligned and it works > fine? 4MB is a little harsh, but I haven't worked through what the exact rules are, I know 4MB is the worst case alignment for arch/i386. The rule is that we have to be at the same offset from 4MB as we were built to run at. So in this case address where (address %4MB) == 1MB. We might be able to get away with 2MB alignment. I thought kexec-tools did that calculation automatically for an ET_DYN image but it has been a while since I looked. My goal with the check was to catch problems early before something bad happened. Eric ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-01 2:31 ` Eric W. Biederman @ 2006-08-01 2:34 ` H. Peter Anvin 2006-08-01 3:44 ` Eric W. Biederman 2006-08-01 4:25 ` Jan Kratochvil 1 sibling, 1 reply; 47+ messages in thread From: H. Peter Anvin @ 2006-08-01 2:34 UTC (permalink / raw) To: Eric W. Biederman Cc: vgoyal, fastboot, Horms, Jan Kratochvil, Magnus Damm, linux-kernel Eric W. Biederman wrote: > >> Ok. I am decompressing the kernel to 16MB and after reducing 1MB of >> CONFIG_PHYSICAL_START I am left with 15MB which is not 4M aligned >> hence I seems to be running into it. >> >> I changed it to >> >> if ((u32)output) & 0x3fffff) >> >> and kdump kernel booted fine. But this will run into issues if I load >> kernel at 1MB. >> >> I got a dump question. Why do I have to load the kernel at 4MB alignment? >> Existing kernel boots loads at 1MB, which is non 4MB aligned and it works >> fine? > > 4MB is a little harsh, but I haven't worked through what the exact rules > are, I know 4MB is the worst case alignment for arch/i386. > 4 MB would be worst case for i386; 2 MB for x86-64. Actually the x86-64 worst case would be gigabyte, but that's more than a little bit extreme. -hpa ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-01 2:34 ` H. Peter Anvin @ 2006-08-01 3:44 ` Eric W. Biederman 0 siblings, 0 replies; 47+ messages in thread From: Eric W. Biederman @ 2006-08-01 3:44 UTC (permalink / raw) To: H. Peter Anvin Cc: vgoyal, fastboot, Horms, Jan Kratochvil, Magnus Damm, linux-kernel "H. Peter Anvin" <hpa@zytor.com> writes: > Eric W. Biederman wrote: >> >>> Ok. I am decompressing the kernel to 16MB and after reducing 1MB of >>> CONFIG_PHYSICAL_START I am left with 15MB which is not 4M aligned >>> hence I seems to be running into it. >>> >>> I changed it to >>> >>> if ((u32)output) & 0x3fffff) >>> >>> and kdump kernel booted fine. But this will run into issues if I load >>> kernel at 1MB. >>> >>> I got a dump question. Why do I have to load the kernel at 4MB alignment? >>> Existing kernel boots loads at 1MB, which is non 4MB aligned and it works >>> fine? >> 4MB is a little harsh, but I haven't worked through what the exact rules >> are, I know 4MB is the worst case alignment for arch/i386. >> > > 4 MB would be worst case for i386; 2 MB for x86-64. Actually the x86-64 worst > case would be gigabyte, but that's more than a little bit extreme. Yep and that is what a test for, except for the gigabyte case which we don't currently implement. Although I can imagine that gigabyte pages might be interesting for the identity mapped part of the page table. Eric ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-01 2:31 ` Eric W. Biederman 2006-08-01 2:34 ` H. Peter Anvin @ 2006-08-01 4:25 ` Jan Kratochvil 2006-08-01 9:09 ` Eric W. Biederman 1 sibling, 1 reply; 47+ messages in thread From: Jan Kratochvil @ 2006-08-01 4:25 UTC (permalink / raw) To: Eric W. Biederman Cc: vgoyal, fastboot, Horms, H. Peter Anvin, Magnus Damm, linux-kernel On Tue, 01 Aug 2006 04:31:43 +0200, Eric W. Biederman wrote: ... > 4MB is a little harsh, but I haven't worked through what the exact rules > are, I know 4MB is the worst case alignment for arch/i386. > > The rule is that we have to be at the same offset from 4MB as we > were built to run at. So in this case address where (address %4MB) == 1MB. In such case your patch is not optimal. The original VA Linux Japan patch 2.0 http://mkdump.sourceforge.net/cvs.html cvs -q -z3 -d:pserver:anonymous:@mkdump.cvs.sourceforge.net:/cvsroot/mkdump rdiff -u -r bp_linux-2_6-minik -r linux-2_6-minik linux had lower alignment requirements and these were really tested that time. i386 had alignment requirement: /* current_thread_info()&co. are 8192-alignment fixed (for the initial stack). */ #if CONFIG_PHYSICAL_START & 0x1FFF #error "CONFIG_PHYSICAL_START must be 2*PAGE_SIZE (0x2000) aligned!" #endif as IIRC those i386 2MB/4MB pages must be (apparently) 2MB/4MB aligned in the virtual address space but their physical target address can be arbitrary. and x86_64 alignment requirement was: #if (CONFIG_PHYSICAL_START - 0x100000) & 0x1FFFFF #error "CONFIG_PHYSICAL_START must be '2MB * x + 1MB' aligned!" #endif while IIRC those x86_64 2MB pages need to have even the physical target address 2MB aligned. Lower alignment would require suboptimal execution by not using the 2MB pages (and the patch would have to handle it appropriately). ( I did not check your patches as they are locked in that useless GIT anyway. ) Regards, Lace ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-01 4:25 ` Jan Kratochvil @ 2006-08-01 9:09 ` Eric W. Biederman 2006-08-01 9:43 ` Jan Kratochvil 0 siblings, 1 reply; 47+ messages in thread From: Eric W. Biederman @ 2006-08-01 9:09 UTC (permalink / raw) To: Jan Kratochvil Cc: vgoyal, fastboot, Horms, H. Peter Anvin, Magnus Damm, linux-kernel Jan Kratochvil <lace@jankratochvil.net> writes: > On Tue, 01 Aug 2006 04:31:43 +0200, Eric W. Biederman wrote: > ... >> 4MB is a little harsh, but I haven't worked through what the exact rules >> are, I know 4MB is the worst case alignment for arch/i386. >> >> The rule is that we have to be at the same offset from 4MB as we >> were built to run at. So in this case address where (address %4MB) == 1MB. > > In such case your patch is not optimal. The original VA Linux Japan patch 2.0 > http://mkdump.sourceforge.net/cvs.html > cvs -q -z3 > -d:pserver:anonymous:@mkdump.cvs.sourceforge.net:/cvsroot/mkdump rdiff -u -r > bp_linux-2_6-minik -r linux-2_6-minik linux > had lower alignment requirements and these were really tested that time. > > i386 had alignment requirement: > /* current_thread_info()&co. are 8192-alignment fixed (for the initial > stack). */ > #if CONFIG_PHYSICAL_START & 0x1FFF > #error "CONFIG_PHYSICAL_START must be 2*PAGE_SIZE (0x2000) aligned!" > #endif > as IIRC those i386 2MB/4MB pages must be (apparently) 2MB/4MB aligned in the > virtual address space but their physical target address can be arbitrary. I know you can't use huge pages if your physical address is not properly aligned. Which can be a performance impact if nothing else. Not something I want to encourage in a general purpose kernel. If it is actually a problem once we get past the user confusion aspect of this I will happily revisit it. The big confusion in all of this is that with a 4MB alignment and a 1MB offset the useable cases are: 1MB, 5MB, 9MB, 13MB, 17MB, 21MB... What I did that is rather unique is I actually enforce this in misc.c so there is no way we can slip by our alignment requirements. I'm not terribly comfortable with the 8K alignment number as we only tell the linker we need 4K alignment. So there might be other implicit things out there as well. Although I admit head.S may be the only place we can get away with that kind of thing, as the linker can move everything else around. Groan yet another kernel audit if we go this route. > and x86_64 alignment requirement was: > #if (CONFIG_PHYSICAL_START - 0x100000) & 0x1FFFFF > #error "CONFIG_PHYSICAL_START must be '2MB * x + 1MB' aligned!" > #endif > while IIRC those x86_64 2MB pages need to have even the physical target address > 2MB aligned. Lower alignment would require suboptimal execution by not using > the 2MB pages (and the patch would have to handle it appropriately). Yes. I have that check. Except now the check really is (CONFIG_PHYSICAL_START & 0x1FFFFF) == 0 because the x86_64 kernel lives at 2MB by default now, so it can really get the benefit of huge pages. > ( I did not check your patches as they are locked in that useless GIT anyway. ) ( As opposed to the unuseable CVS I presume :) I guess I should just post them so we can have a sane conversation :) Eric ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-01 9:09 ` Eric W. Biederman @ 2006-08-01 9:43 ` Jan Kratochvil 2006-08-01 11:28 ` Eric W. Biederman 0 siblings, 1 reply; 47+ messages in thread From: Jan Kratochvil @ 2006-08-01 9:43 UTC (permalink / raw) To: Eric W. Biederman Cc: vgoyal, fastboot, Horms, H. Peter Anvin, Magnus Damm, linux-kernel On Tue, 01 Aug 2006 11:09:28 +0200, Eric W. Biederman wrote: > Jan Kratochvil <lace@jankratochvil.net> writes: ... > > i386 had alignment requirement: > > /* current_thread_info()&co. are 8192-alignment fixed (for the initial > > stack). */ > > #if CONFIG_PHYSICAL_START & 0x1FFF > > #error "CONFIG_PHYSICAL_START must be 2*PAGE_SIZE (0x2000) aligned!" > > #endif > > as IIRC those i386 2MB/4MB pages must be (apparently) 2MB/4MB aligned in the > > virtual address space but their physical target address can be arbitrary. > > I know you can't use huge pages if your physical address is not > properly aligned. Which can be a performance impact if nothing else. > Not something I want to encourage in a general purpose kernel. So you rather crash than running in that unmeasurably lower performance? IIRC those 2MB/4MB pages performance "gain" is still present (in my patch) even if the kernel location is not 2MB/4MB aligned because the i386 2MB/4MB pagetable entries can have arbitrary physical memory target address. But maybe I lie here, sorry, I really do not remember it much. (It 100% worked with the "full performance" if aligned and it "worked" if unaligned but I do not remember if it worked "full performance" if unaligned.) ... > I'm not terribly comfortable with the 8K alignment number as we only > tell the linker we need 4K alignment. Yes, it should be fixed there so that the stacks get allocated 8KB-aligned not depending on the kernel code position at all. That means allocating the initial stack by code and not relying on its autoallocation by the linker. There would remain the 4KB alignment requirement due to the physical target address of the pagetable entries. ... > > ( I did not check your patches as they are locked in that useless GIT anyway. ) > ( As opposed to the unuseable CVS I presume :) Yes, it has the same unusability as CVS, just it looses the feature of being the standard. I assume some CVS flamewar already occured some time ago. Regards, Lace ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-01 9:43 ` Jan Kratochvil @ 2006-08-01 11:28 ` Eric W. Biederman 0 siblings, 0 replies; 47+ messages in thread From: Eric W. Biederman @ 2006-08-01 11:28 UTC (permalink / raw) To: Jan Kratochvil Cc: vgoyal, fastboot, Horms, H. Peter Anvin, Magnus Damm, linux-kernel Jan Kratochvil <lace@jankratochvil.net> writes: > So you rather crash than running in that unmeasurably lower performance? No simply I would rather not boot than run something I'm not certain will work. If we align things deliberately for better performance I don't want to cope with that either. > ... >> I'm not terribly comfortable with the 8K alignment number as we only >> tell the linker we need 4K alignment. > > Yes, it should be fixed there so that the stacks get allocated 8KB-aligned not > depending on the kernel code position at all. That means allocating the > initial stack by code and not relying on its autoallocation by the linker. > There would remain the 4KB alignment requirement due to the physical target > address of the pagetable entries. So thinking about this. By processing relocations we end up with no page table related relocation restrictions except that we must be within the identity mapped page table area. Not even the 4KB is directly a page table related alignment restriction. So the right answer is to review the arch/i386 kernel and make certain we don't have any implicit alignment requirements, (and if we do making them explicit so the linker will honor and report them). At which point all I need to do is to copy the required alignment from vmlinux to the ELF header of the bzImage. >> > ( I did not check your patches as they are locked in that useless GIT > anyway. ) For code review sending patches is still the best way to do it. Patches in email are easier to comment on, and require less work for people to actually look at. So since you have complained I have sent out all of the patches. My evil plan is to keep making interesting things available in GIT until it is no longer considered useless :) Eric ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-07-31 16:19 ` [CFT] ELF Relocatable x86 and x86_64 bzImages Eric W. Biederman 2006-07-31 20:25 ` Vivek Goyal @ 2006-08-04 21:08 ` Don Zickus 2006-08-04 21:25 ` Eric W. Biederman 2006-08-25 20:16 ` Vivek Goyal 2 siblings, 1 reply; 47+ messages in thread From: Don Zickus @ 2006-08-04 21:08 UTC (permalink / raw) To: Eric W. Biederman Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel On Mon, Jul 31, 2006 at 10:19:04AM -0600, Eric W. Biederman wrote: > > I have spent some time and have gotten my relocatable kernel patches > working against the latest kernels. I intend to push this upstream > shortly. > > Could all of the people who care take a look and test this out > to make certain that it doesn't just work on my test box? Is there any reason to get following error on x86_64 using your patches? Filesystem type is ext2fs, partition type 0x83 kernel /bzImage ro root=LABEL=/1 console=ttyS0,115200 earlyprintk=ttyS0,115200 [Linux-bzImage, setup=0x1c00, size=0x24917c] initrd /initrd-2.6.18-rc3.img [Linux-initrd @ 0x37e0d000, 0x1e25e7 bytes] . Decompressing Linux... length error -- System halted I can get i386 to boot fine. I can't for the life of me figure out what I am doing wrong.. Cheers, Don ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-04 21:08 ` Don Zickus @ 2006-08-04 21:25 ` Eric W. Biederman 2006-08-04 23:43 ` Don Zickus 0 siblings, 1 reply; 47+ messages in thread From: Eric W. Biederman @ 2006-08-04 21:25 UTC (permalink / raw) To: Don Zickus Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel Don Zickus <dzickus@redhat.com> writes: > On Mon, Jul 31, 2006 at 10:19:04AM -0600, Eric W. Biederman wrote: >> >> I have spent some time and have gotten my relocatable kernel patches >> working against the latest kernels. I intend to push this upstream >> shortly. >> >> Could all of the people who care take a look and test this out >> to make certain that it doesn't just work on my test box? > > Is there any reason to get following error on x86_64 using your patches? There shouldn't be. > Filesystem type is ext2fs, partition type 0x83 > kernel /bzImage ro root=LABEL=/1 console=ttyS0,115200 > earlyprintk=ttyS0,115200 > [Linux-bzImage, setup=0x1c00, size=0x24917c] > initrd /initrd-2.6.18-rc3.img > [Linux-initrd @ 0x37e0d000, 0x1e25e7 bytes] > > . > Decompressing Linux... > > length error > > -- System halted > > > I can get i386 to boot fine. I can't for the life of me figure out what I > am doing wrong.. The length error comes from lib/inflate.c I think it would be interesting to look at orig_len and bytes_out. My hunch is that I have tripped over a tool chain bug or a weird alignment issue. The error is the uncompressed length does not math the stored length of the data before from before we compressed it. Now what is fascinating is that our crc's match (as that check is performed first). Something is very slightly off and I don't see what it is. After looking at the state variables I would probably start looking at the uncompressed data to see if it really was decompressing properly. If nothing else that is the kind of process that would tend to spark a clue. Eric ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-04 21:25 ` Eric W. Biederman @ 2006-08-04 23:43 ` Don Zickus 2006-08-05 7:49 ` Eric W. Biederman 2006-08-05 16:07 ` Eric W. Biederman 0 siblings, 2 replies; 47+ messages in thread From: Don Zickus @ 2006-08-04 23:43 UTC (permalink / raw) To: Eric W. Biederman Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel > The length error comes from lib/inflate.c > > I think it would be interesting to look at orig_len and bytes_out. > > My hunch is that I have tripped over a tool chain bug or a weird > alignment issue. I thought so too, but I took vmlinuz images from people (Vivek) who had it boot on their systems but those images still failed on my two machines. > > The error is the uncompressed length does not math the stored length > of the data before from before we compressed it. Now what is > fascinating is that our crc's match (as that check is performed first). > > Something is very slightly off and I don't see what it is. I printed out orig_len -> 5910532 (which matches vmlinux.bin) bytes_out -> 5910531 > > After looking at the state variables I would probably start looking > at the uncompressed data to see if it really was decompressing > properly. If nothing else that is the kind of process that would tend > to spark a clue. I am not familiar with the code, so very few sparks are flying. I'll still dig through though. Thanks for the tips. Cheers, Don ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-04 23:43 ` Don Zickus @ 2006-08-05 7:49 ` Eric W. Biederman 2006-08-05 16:07 ` Eric W. Biederman 1 sibling, 0 replies; 47+ messages in thread From: Eric W. Biederman @ 2006-08-05 7:49 UTC (permalink / raw) To: Don Zickus Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel Don Zickus <dzickus@redhat.com> writes: >> The length error comes from lib/inflate.c >> >> I think it would be interesting to look at orig_len and bytes_out. >> >> My hunch is that I have tripped over a tool chain bug or a weird >> alignment issue. > > I thought so too, but I took vmlinuz images from people (Vivek) who had it > boot on their systems but those images still failed on my two machines. Odd. That might narrow things down. This is just booting with grub so there is no relocation specific weirdness coming into play. >> The error is the uncompressed length does not math the stored length >> of the data before from before we compressed it. Now what is >> fascinating is that our crc's match (as that check is performed first). >> >> Something is very slightly off and I don't see what it is. > > I printed out orig_len -> 5910532 (which matches vmlinux.bin) > bytes_out -> 5910531 Is the last byte of vmlinux.bin 0? One byte off certainly, fits my patter of something slightly off. >> After looking at the state variables I would probably start looking >> at the uncompressed data to see if it really was decompressing >> properly. If nothing else that is the kind of process that would tend >> to spark a clue. > > I am not familiar with the code, so very few sparks are flying. I'll > still dig through though. Thanks for the tips. Welcome. Eric ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-04 23:43 ` Don Zickus 2006-08-05 7:49 ` Eric W. Biederman @ 2006-08-05 16:07 ` Eric W. Biederman 2006-08-07 17:44 ` Don Zickus 1 sibling, 1 reply; 47+ messages in thread From: Eric W. Biederman @ 2006-08-05 16:07 UTC (permalink / raw) To: Don Zickus Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel Don Zickus <dzickus@redhat.com> writes: >> The length error comes from lib/inflate.c >> >> I think it would be interesting to look at orig_len and bytes_out. >> >> My hunch is that I have tripped over a tool chain bug or a weird >> alignment issue. > > I thought so too, but I took vmlinuz images from people (Vivek) who had it > boot on their systems but those images still failed on my two machines. > >> >> The error is the uncompressed length does not math the stored length >> of the data before from before we compressed it. Now what is >> fascinating is that our crc's match (as that check is performed first). >> >> Something is very slightly off and I don't see what it is. > > I printed out orig_len -> 5910532 (which matches vmlinux.bin) > bytes_out -> 5910531 > >> >> After looking at the state variables I would probably start looking >> at the uncompressed data to see if it really was decompressing >> properly. If nothing else that is the kind of process that would tend >> to spark a clue. > > I am not familiar with the code, so very few sparks are flying. I'll > still dig through though. Thanks for the tips. I guess the interesting thing to do would be to - Recompute the crc to see if we still match. - Possibly instrument of flush_window. I have a strange feeling that the uncompressed data is getting corrupted after we have flushed the window. Eric ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-05 16:07 ` Eric W. Biederman @ 2006-08-07 17:44 ` Don Zickus 2006-08-07 18:08 ` Eric W. Biederman 0 siblings, 1 reply; 47+ messages in thread From: Don Zickus @ 2006-08-07 17:44 UTC (permalink / raw) To: Eric W. Biederman Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel On Sat, Aug 05, 2006 at 10:07:01AM -0600, Eric W. Biederman wrote: > Don Zickus <dzickus@redhat.com> writes: > > >> The length error comes from lib/inflate.c > >> > >> I think it would be interesting to look at orig_len and bytes_out. > >> > >> My hunch is that I have tripped over a tool chain bug or a weird > >> alignment issue. > > > > I thought so too, but I took vmlinuz images from people (Vivek) who had it > > boot on their systems but those images still failed on my two machines. > > > >> > >> The error is the uncompressed length does not math the stored length > >> of the data before from before we compressed it. Now what is > >> fascinating is that our crc's match (as that check is performed first). > >> > >> Something is very slightly off and I don't see what it is. > > > > I printed out orig_len -> 5910532 (which matches vmlinux.bin) > > bytes_out -> 5910531 > > > >> > >> After looking at the state variables I would probably start looking > >> at the uncompressed data to see if it really was decompressing > >> properly. If nothing else that is the kind of process that would tend > >> to spark a clue. > > > > I am not familiar with the code, so very few sparks are flying. I'll > > still dig through though. Thanks for the tips. > > I guess the interesting thing to do would be to > - Recompute the crc to see if we still match. > - Possibly instrument of flush_window. > > I have a strange feeling that the uncompressed data is getting corrupted > after we have flushed the window. It seems to be an AMD64 vs EM64T problem. AMD chipsets work but Intel chipsets don't. I also blindly incremented bytes_out (as a really cheap hack), it didn't work until I added some random putstr's below it (timing??). Then the kernel booted. Still looking into things. Cheers, Don > > Eric > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-07 17:44 ` Don Zickus @ 2006-08-07 18:08 ` Eric W. Biederman 2006-08-07 23:57 ` Don Zickus 0 siblings, 1 reply; 47+ messages in thread From: Eric W. Biederman @ 2006-08-07 18:08 UTC (permalink / raw) To: Don Zickus Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel Don Zickus <dzickus@redhat.com> writes: > On Sat, Aug 05, 2006 at 10:07:01AM -0600, Eric W. Biederman wrote: >> Don Zickus <dzickus@redhat.com> writes: >> >> >> The length error comes from lib/inflate.c >> >> >> >> I think it would be interesting to look at orig_len and bytes_out. >> >> >> >> My hunch is that I have tripped over a tool chain bug or a weird >> >> alignment issue. >> > >> > I thought so too, but I took vmlinuz images from people (Vivek) who had it >> > boot on their systems but those images still failed on my two machines. >> > >> >> >> >> The error is the uncompressed length does not math the stored length >> >> of the data before from before we compressed it. Now what is >> >> fascinating is that our crc's match (as that check is performed first). >> >> >> >> Something is very slightly off and I don't see what it is. >> > >> > I printed out orig_len -> 5910532 (which matches vmlinux.bin) >> > bytes_out -> 5910531 >> > >> >> >> >> After looking at the state variables I would probably start looking >> >> at the uncompressed data to see if it really was decompressing >> >> properly. If nothing else that is the kind of process that would tend >> >> to spark a clue. >> > >> > I am not familiar with the code, so very few sparks are flying. I'll >> > still dig through though. Thanks for the tips. >> >> I guess the interesting thing to do would be to >> - Recompute the crc to see if we still match. >> - Possibly instrument of flush_window. >> >> I have a strange feeling that the uncompressed data is getting corrupted >> after we have flushed the window. > > It seems to be an AMD64 vs EM64T problem. AMD chipsets work but Intel > chipsets don't. > > I also blindly incremented bytes_out (as a really cheap hack), it didn't > work until I added some random putstr's below it (timing??). Then the > kernel booted. > > Still looking into things. Odd. I wonder if I'm missing a serializing instruction somewhere, to ensure the effects of ``self modifying code'' aren't a problem. As I read Intels Documentation if you have a jump before you get to the code there shouldn't be a problem. Still that doesn't really explain bytes_out. Eric ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-07 18:08 ` Eric W. Biederman @ 2006-08-07 23:57 ` Don Zickus 2006-08-08 5:01 ` Eric W. Biederman 2006-08-08 23:36 ` Andi Kleen 0 siblings, 2 replies; 47+ messages in thread From: Don Zickus @ 2006-08-07 23:57 UTC (permalink / raw) To: Eric W. Biederman Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel > >> > > >> >> > >> >> The error is the uncompressed length does not math the stored length > >> >> of the data before from before we compressed it. Now what is > >> >> fascinating is that our crc's match (as that check is performed first). > >> >> > >> >> Something is very slightly off and I don't see what it is. > >> > > >> > I printed out orig_len -> 5910532 (which matches vmlinux.bin) > >> > bytes_out -> 5910531 > >> > > >> >> > > It seems to be an AMD64 vs EM64T problem. AMD chipsets work but Intel > > chipsets don't. > > > > I also blindly incremented bytes_out (as a really cheap hack), it didn't > > work until I added some random putstr's below it (timing??). Then the > > kernel booted. > > > > Still looking into things. > > Odd. I wonder if I'm missing a serializing instruction somewhere, > to ensure the effects of ``self modifying code'' aren't a problem. > As I read Intels Documentation if you have a jump before you get > to the code there shouldn't be a problem. > > Still that doesn't really explain bytes_out. > So I narrowed down the problem but it isn't obvious to me why this problem exists. Basically, even though bytes_out is supposed to be initialized to 0, it becomes -1 before entering decompress_kernel(). Of course, the fallout is in flush_window() bytes_out wounds up being one less than outcnt and hence my original problem. Any thoughts on how to debug where this could be getting corrupted? Cheers, Don ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-07 23:57 ` Don Zickus @ 2006-08-08 5:01 ` Eric W. Biederman 2006-08-08 19:36 ` Don Zickus 2006-08-09 20:06 ` Don Zickus 2006-08-08 23:36 ` Andi Kleen 1 sibling, 2 replies; 47+ messages in thread From: Eric W. Biederman @ 2006-08-08 5:01 UTC (permalink / raw) To: Don Zickus Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel Don Zickus <dzickus@redhat.com> writes: >> >> > >> >> >> >> >> >> The error is the uncompressed length does not math the stored length >> >> >> of the data before from before we compressed it. Now what is >> >> >> fascinating is that our crc's match (as that check is performed first). >> >> >> >> >> >> Something is very slightly off and I don't see what it is. >> >> > >> >> > I printed out orig_len -> 5910532 (which matches vmlinux.bin) >> >> > bytes_out -> 5910531 >> >> > >> >> >> >> > It seems to be an AMD64 vs EM64T problem. AMD chipsets work but Intel >> > chipsets don't. >> > >> > I also blindly incremented bytes_out (as a really cheap hack), it didn't >> > work until I added some random putstr's below it (timing??). Then the >> > kernel booted. >> > >> > Still looking into things. >> >> Odd. I wonder if I'm missing a serializing instruction somewhere, >> to ensure the effects of ``self modifying code'' aren't a problem. >> As I read Intels Documentation if you have a jump before you get >> to the code there shouldn't be a problem. >> >> Still that doesn't really explain bytes_out. >> > > So I narrowed down the problem but it isn't obvious to me why this problem > exists. Basically, even though bytes_out is supposed to be initialized to > 0, it becomes -1 before entering decompress_kernel(). Of course, the > fallout is in flush_window() bytes_out wounds up being one less than > outcnt and hence my original problem. > > Any thoughts on how to debug where this could be getting corrupted? Looking at my build it appears bytes_out is being placed in the .bss. A little odd since it is zero initialized but no big deal. Could you confirm that bytes_out is being placed in the .bss section by inspecting arch/x86_64/boot/compresssed/misc.o and arch/x86_64/boot_compressed/vmlinux. "readelf -a $file" and then looking up the section number and looking at the section table to see which section it is was my technique. If bytes_out is in the .bss for you then I suspect something is not correctly zeroing the .bss. Or else the .bss is being stomped. I'm not certain how rep stosb can be done wrong but some bad pointer math could have done it. Eric ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-08 5:01 ` Eric W. Biederman @ 2006-08-08 19:36 ` Don Zickus 2006-08-09 20:06 ` Don Zickus 1 sibling, 0 replies; 47+ messages in thread From: Don Zickus @ 2006-08-08 19:36 UTC (permalink / raw) To: Eric W. Biederman Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel On Mon, Aug 07, 2006 at 11:01:53PM -0600, Eric W. Biederman wrote: > Don Zickus <dzickus@redhat.com> writes: > > >> >> > > >> >> >> > >> >> >> The error is the uncompressed length does not math the stored length > >> >> >> of the data before from before we compressed it. Now what is > >> >> >> fascinating is that our crc's match (as that check is performed first). > >> >> >> > >> >> >> Something is very slightly off and I don't see what it is. > >> >> > > >> >> > I printed out orig_len -> 5910532 (which matches vmlinux.bin) > >> >> > bytes_out -> 5910531 > >> >> > > >> >> >> > >> > It seems to be an AMD64 vs EM64T problem. AMD chipsets work but Intel > >> > chipsets don't. > >> > > >> > I also blindly incremented bytes_out (as a really cheap hack), it didn't > >> > work until I added some random putstr's below it (timing??). Then the > >> > kernel booted. > >> > > >> > Still looking into things. > >> > >> Odd. I wonder if I'm missing a serializing instruction somewhere, > >> to ensure the effects of ``self modifying code'' aren't a problem. > >> As I read Intels Documentation if you have a jump before you get > >> to the code there shouldn't be a problem. > >> > >> Still that doesn't really explain bytes_out. > >> > > > > So I narrowed down the problem but it isn't obvious to me why this problem > > exists. Basically, even though bytes_out is supposed to be initialized to > > 0, it becomes -1 before entering decompress_kernel(). Of course, the > > fallout is in flush_window() bytes_out wounds up being one less than > > outcnt and hence my original problem. > > > > Any thoughts on how to debug where this could be getting corrupted? > > Looking at my build it appears bytes_out is being placed in the .bss. > A little odd since it is zero initialized but no big deal. > Could you confirm that bytes_out is being placed in the .bss section > by inspecting arch/x86_64/boot/compresssed/misc.o and > arch/x86_64/boot_compressed/vmlinux. "readelf -a $file" and then > looking up the section number and looking at the section table to see > which section it is was my technique. Yes bytes_out is in the .bss for both files. > > If bytes_out is in the .bss for you then I suspect something is not > correctly zeroing the .bss. Or else the .bss is being stomped. > > I'm not certain how rep stosb can be done wrong but some bad pointer > math could have done it. Even worse, from the time the .bss is cleared to the time gunzip() is called inside decompress_kernel(), there is very little code to do some stomping. So I am stuck trying to debug this. This code seems very fragile. The more debug code I add (ie putstr) the more the length is off (varies from -32 to +1). Makes me scratch my head as to what is really going on here. I created a really pathetic patch to get the thing to boot but even that doesn't make sense. diff --git a/arch/x86_64/boot/compressed/misc.c b/arch/x86_64/boot/compressed/misc.c index 0e6c4b7..614416e 100644 --- a/arch/x86_64/boot/compressed/misc.c +++ b/arch/x86_64/boot/compressed/misc.c @@ -183,6 +183,7 @@ #define OLD_CL_MAGIC 0xA33F extern unsigned char input_data[]; extern int input_len; +static long dummy; static long bytes_out = 0; static void *malloc(int size); @@ -594,6 +595,7 @@ asmlinkage void decompress_kernel(void * if ((ulg)output >= 0xffffffffffUL) error("Destination address too large"); + bytes_out = 0; makecrc(); putstr(".\nDecompressing Linux..."); gunzip(); And yes, the 'dummy' variable needs to be there. I am trying to use gdb on vmlinux to fish for clues. But I am at a loss right now. Cheers, Don > > Eric ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-08 5:01 ` Eric W. Biederman 2006-08-08 19:36 ` Don Zickus @ 2006-08-09 20:06 ` Don Zickus 2006-08-10 6:09 ` Eric W. Biederman 1 sibling, 1 reply; 47+ messages in thread From: Don Zickus @ 2006-08-09 20:06 UTC (permalink / raw) To: Eric W. Biederman Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel, vgoyal > Looking at my build it appears bytes_out is being placed in the .bss. > A little odd since it is zero initialized but no big deal. > Could you confirm that bytes_out is being placed in the .bss section > by inspecting arch/x86_64/boot/compresssed/misc.o and > arch/x86_64/boot_compressed/vmlinux. "readelf -a $file" and then > looking up the section number and looking at the section table to see > which section it is was my technique. > > If bytes_out is in the .bss for you then I suspect something is not > correctly zeroing the .bss. Or else the .bss is being stomped. > > I'm not certain how rep stosb can be done wrong but some bad pointer > math could have done it. > > Eric It seems Vivek came up with a solution that works. He sent it to me this morning. We tested a bunch of machines and things seem to work now. It looks like it mimics the i386 behaviour now. Cheers, Don Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> --- arch/x86_64/boot/compressed/head.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff -puN arch/x86_64/boot/compressed/head.S~x86_64-bss-clearing-test arch/x86_64/boot/compressed/head.S --- linux-2.6.18-rc3-1M/arch/x86_64/boot/compressed/head.S~x86_64-bss-clearing-test 2006-08-09 09:43:17.000000000 -0400 +++ linux-2.6.18-rc3-1M-root/arch/x86_64/boot/compressed/head.S 2006-08-09 09:43:34.000000000 -0400 @@ -235,8 +235,8 @@ relocated: /* * Clear BSS */ - movq $_edata, %rdi - movq $_end, %rcx + leaq _edata(%rbx), %rdi + leaq _end(%rbx), %rcx subq %rdi, %rcx cld rep _ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-09 20:06 ` Don Zickus @ 2006-08-10 6:09 ` Eric W. Biederman 2006-08-10 13:13 ` Vivek Goyal 0 siblings, 1 reply; 47+ messages in thread From: Eric W. Biederman @ 2006-08-10 6:09 UTC (permalink / raw) To: Don Zickus Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel, vgoyal Don Zickus <dzickus@redhat.com> writes: >> Looking at my build it appears bytes_out is being placed in the .bss. >> A little odd since it is zero initialized but no big deal. >> Could you confirm that bytes_out is being placed in the .bss section >> by inspecting arch/x86_64/boot/compresssed/misc.o and >> arch/x86_64/boot_compressed/vmlinux. "readelf -a $file" and then >> looking up the section number and looking at the section table to see >> which section it is was my technique. >> >> If bytes_out is in the .bss for you then I suspect something is not >> correctly zeroing the .bss. Or else the .bss is being stomped. >> >> I'm not certain how rep stosb can be done wrong but some bad pointer >> math could have done it. >> >> Eric > > It seems Vivek came up with a solution that works. He sent it to me this > morning. We tested a bunch of machines and things seem to work now. It > looks like it mimics the i386 behaviour now. Yes, this looks right. It looks like I forgot to make this change when the logic from i386 was adopted to x86_64, ages ago. This is exactly the place in the code I would have expected a bug from the symptoms you were seeing. Thanks all I will include this in my version of the patches. Eric ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-10 6:09 ` Eric W. Biederman @ 2006-08-10 13:13 ` Vivek Goyal 2006-08-10 17:05 ` Eric W. Biederman 0 siblings, 1 reply; 47+ messages in thread From: Vivek Goyal @ 2006-08-10 13:13 UTC (permalink / raw) To: Eric W. Biederman Cc: Don Zickus, fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel On Thu, Aug 10, 2006 at 12:09:56AM -0600, Eric W. Biederman wrote: > Don Zickus <dzickus@redhat.com> writes: > > >> Looking at my build it appears bytes_out is being placed in the .bss. > >> A little odd since it is zero initialized but no big deal. > >> Could you confirm that bytes_out is being placed in the .bss section > >> by inspecting arch/x86_64/boot/compresssed/misc.o and > >> arch/x86_64/boot_compressed/vmlinux. "readelf -a $file" and then > >> looking up the section number and looking at the section table to see > >> which section it is was my technique. > >> > >> If bytes_out is in the .bss for you then I suspect something is not > >> correctly zeroing the .bss. Or else the .bss is being stomped. > >> > >> I'm not certain how rep stosb can be done wrong but some bad pointer > >> math could have done it. > >> > >> Eric > > > > It seems Vivek came up with a solution that works. He sent it to me this > > morning. We tested a bunch of machines and things seem to work now. It > > looks like it mimics the i386 behaviour now. > > Yes, this looks right. It looks like I forgot to make this change when > the logic from i386 was adopted to x86_64, ages ago. > > This is exactly the place in the code I would have expected a bug > from the symptoms you were seeing. > > Thanks all I will include this in my version of the patches. Apart from this I think something is still off on x86_64. I have not been able to make kdump work on x86_64. Second kernel simply hangs. Two different machines are showing different results. - On one machine, it seems to be stuck somewhere in decompress_kernel(). Serial console is not behaving properly even with earlyprintk(). Somehow I feel it is some bss corruption even after my changes. - Other machines seems to be going till start_kernel() and even after that (No messages on the console, all serial debugging) and then either it hangs or jumps back to BIOS. Will look more into it. Thanks Vivek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-10 13:13 ` Vivek Goyal @ 2006-08-10 17:05 ` Eric W. Biederman 2006-08-10 18:18 ` Vivek Goyal 0 siblings, 1 reply; 47+ messages in thread From: Eric W. Biederman @ 2006-08-10 17:05 UTC (permalink / raw) To: vgoyal Cc: Don Zickus, fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel Vivek Goyal <vgoyal@in.ibm.com> writes: > Apart from this I think something is still off on x86_64. I have not > been able to make kdump work on x86_64. Second kernel simply hangs. > Two different machines are showing different results. > > - On one machine, it seems to be stuck somewhere in decompress_kernel(). > Serial console is not behaving properly even with earlyprintk(). Somehow > I feel it is some bss corruption even after my changes. > > - Other machines seems to be going till start_kernel() and even after > that (No messages on the console, all serial debugging) and then > either it hangs or jumps back to BIOS. > > Will look more into it. Thanks. I'm a little disappointed but at this point it isn't a great surprise, the code is early yet and hasn't had much testing or attention. I wonder if I have missed something else silly. As for testing, can you use plain kexec to load the kernel at a different address? I'm curious to know if it is something related to the kexec on panic path or if it is just running at a different location that is the problem. I'm back on the namespace stuff this week so it will be a while before I get back to this. It doesn't look like I have time to work the whole patchset at once. So my current plan is to take as many pieces that make sense by themselves and push them upstream. Until we get down to just the relocatable kernel patches that are outstanding. Everything was fairly well received on the round of reviews with some minor nits that needed to be picked. So I think this is doable. Eric ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-10 17:05 ` Eric W. Biederman @ 2006-08-10 18:18 ` Vivek Goyal 2006-08-10 20:09 ` Eric W. Biederman 0 siblings, 1 reply; 47+ messages in thread From: Vivek Goyal @ 2006-08-10 18:18 UTC (permalink / raw) To: Eric W. Biederman Cc: Don Zickus, fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel On Thu, Aug 10, 2006 at 11:05:22AM -0600, Eric W. Biederman wrote: > Vivek Goyal <vgoyal@in.ibm.com> writes: > > > Apart from this I think something is still off on x86_64. I have not > > been able to make kdump work on x86_64. Second kernel simply hangs. > > Two different machines are showing different results. > > > > - On one machine, it seems to be stuck somewhere in decompress_kernel(). > > Serial console is not behaving properly even with earlyprintk(). Somehow > > I feel it is some bss corruption even after my changes. > > > > - Other machines seems to be going till start_kernel() and even after > > that (No messages on the console, all serial debugging) and then > > either it hangs or jumps back to BIOS. > > > > Will look more into it. > > Thanks. > > I'm a little disappointed but at this point it isn't a great surprise, > the code is early yet and hasn't had much testing or attention. > I wonder if I have missed something else silly. > > As for testing, can you use plain kexec to load the kernel at a > different address? I'm curious to know if it is something related > to the kexec on panic path or if it is just running at a different > location that is the problem. Yes. This seems to be minor stuff. Parameter segment seems to be getting stomped while I am doing decompression. Most probably should be coming from extra space calculations (32K etc) being done at run time to find out where should we shift the compressed image. Kexec works because parameter segment is being loaded below the compressed image and doest not get stomped over. :-) I just reserved memory at non 2MB aligned location 65MB@15MB so that kernel is loaded at 16MB and other smaller segments below the compressed image, then I can successfully booted into the kdump kernel. So basically kexec on panic path seems to be clean except stomping issue. May be bzImage program header should reflect right "MemSize" which takes into account extra memory space calculations. Thanks Vivek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-10 18:18 ` Vivek Goyal @ 2006-08-10 20:09 ` Eric W. Biederman 2006-08-11 21:25 ` Don Zickus 2006-08-14 16:51 ` [Fastboot] " Vivek Goyal 0 siblings, 2 replies; 47+ messages in thread From: Eric W. Biederman @ 2006-08-10 20:09 UTC (permalink / raw) To: vgoyal Cc: Don Zickus, fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel Vivek Goyal <vgoyal@in.ibm.com> writes: > On Thu, Aug 10, 2006 at 11:05:22AM -0600, Eric W. Biederman wrote: >> Vivek Goyal <vgoyal@in.ibm.com> writes: >> >> > Apart from this I think something is still off on x86_64. I have not >> > been able to make kdump work on x86_64. Second kernel simply hangs. >> > Two different machines are showing different results. >> > >> > - On one machine, it seems to be stuck somewhere in decompress_kernel(). >> > Serial console is not behaving properly even with earlyprintk(). Somehow >> > I feel it is some bss corruption even after my changes. >> > >> > - Other machines seems to be going till start_kernel() and even after >> > that (No messages on the console, all serial debugging) and then >> > either it hangs or jumps back to BIOS. >> > >> > Will look more into it. >> >> Thanks. >> >> I'm a little disappointed but at this point it isn't a great surprise, >> the code is early yet and hasn't had much testing or attention. >> I wonder if I have missed something else silly. >> >> As for testing, can you use plain kexec to load the kernel at a >> different address? I'm curious to know if it is something related >> to the kexec on panic path or if it is just running at a different >> location that is the problem. > > Yes. This seems to be minor stuff. Parameter segment seems to be > getting stomped while I am doing decompression. Most probably should > be coming from extra space calculations (32K etc) being done at run > time to find out where should we shift the compressed image. > > Kexec works because parameter segment is being loaded below the > compressed image and doest not get stomped over. :-) Ah. That makes sense. > I just reserved memory at non 2MB aligned location 65MB@15MB so that > kernel is loaded at 16MB and other smaller segments below the compressed > image, then I can successfully booted into the kdump kernel. :) > So basically kexec on panic path seems to be clean except stomping issue. > May be bzImage program header should reflect right "MemSize" which > takes into account extra memory space calculations. Yes. That sounds like the right thing to do. I remember trying to compute a good memsize when I created the bzImage header but it is completely possible I missed some part of the calculation or assumed that the kernels .bss section would always be larger than what I needed for decompression. Eric ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-10 20:09 ` Eric W. Biederman @ 2006-08-11 21:25 ` Don Zickus 2006-08-12 7:20 ` Eric W. Biederman 2006-08-14 16:51 ` [Fastboot] " Vivek Goyal 1 sibling, 1 reply; 47+ messages in thread From: Don Zickus @ 2006-08-11 21:25 UTC (permalink / raw) To: Eric W. Biederman Cc: vgoyal, fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel > >> > >> I'm a little disappointed but at this point it isn't a great surprise, > >> the code is early yet and hasn't had much testing or attention. > >> I wonder if I have missed something else silly. > >> > >> As for testing, can you use plain kexec to load the kernel at a > >> different address? I'm curious to know if it is something related > >> to the kexec on panic path or if it is just running at a different > >> location that is the problem. > > I think I have found the 'something silly'. Here is a patch that allows our Dell em64t boxes to boot. This change matches the original code. The main difference that caused the problems was the setting of _PAGE_NX bit. This caused issues in early_io_remap(). Thanks to Larry Woodman for debugging this. Cheers, Don Signed-off-by: Don Zickus <dzickus@redhat.com> --- linux-2.6.17.noarch/arch/x86_64/mm/init.c.orig 2006-08-11 12:35:58.000000000 -0400 +++ linux-2.6.17.noarch/arch/x86_64/mm/init.c 2006-08-11 13:14:20.000000000 -0400 @@ -196,7 +196,7 @@ vaddr += addr & ~PMD_MASK; addr &= PMD_MASK; for (i = 0; i < pmds; i++, addr += PMD_SIZE) - set_pmd(pmd + i,__pmd(addr | __PAGE_KERNEL_LARGE)); + set_pmd(pmd + i,__pmd(addr | _KERNPG_TABLE | _PAGE_PSE)); __flush_tlb(); return (void *)vaddr; next: ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-11 21:25 ` Don Zickus @ 2006-08-12 7:20 ` Eric W. Biederman 2006-08-12 15:25 ` Don Zickus 2006-08-13 20:06 ` Andi Kleen 0 siblings, 2 replies; 47+ messages in thread From: Eric W. Biederman @ 2006-08-12 7:20 UTC (permalink / raw) To: Don Zickus Cc: vgoyal, fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel Don Zickus <dzickus@redhat.com> writes: >> >> >> >> I'm a little disappointed but at this point it isn't a great surprise, >> >> the code is early yet and hasn't had much testing or attention. >> >> I wonder if I have missed something else silly. >> >> >> >> As for testing, can you use plain kexec to load the kernel at a >> >> different address? I'm curious to know if it is something related >> >> to the kexec on panic path or if it is just running at a different >> >> location that is the problem. >> > > > I think I have found the 'something silly'. Here is a patch that allows > our Dell em64t boxes to boot. This change matches the original code. The > main difference that caused the problems was the setting of _PAGE_NX bit. > This caused issues in early_io_remap(). > > Thanks to Larry Woodman for debugging this. This looks like a different one but looks fairly sane. Do you know what code had problems having _PAGE_NX set. What are we doing with early_ioremap the requires execute permissions. It doesn't sound right that we would need this. Eric ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-12 7:20 ` Eric W. Biederman @ 2006-08-12 15:25 ` Don Zickus 2006-08-12 19:41 ` Eric W. Biederman 2006-08-13 20:06 ` Andi Kleen 1 sibling, 1 reply; 47+ messages in thread From: Don Zickus @ 2006-08-12 15:25 UTC (permalink / raw) To: Eric W. Biederman Cc: vgoyal, fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel On Sat, Aug 12, 2006 at 01:20:29AM -0600, Eric W. Biederman wrote: > Don Zickus <dzickus@redhat.com> writes: > > >> >> > >> >> I'm a little disappointed but at this point it isn't a great surprise, > >> >> the code is early yet and hasn't had much testing or attention. > >> >> I wonder if I have missed something else silly. > >> >> > >> >> As for testing, can you use plain kexec to load the kernel at a > >> >> different address? I'm curious to know if it is something related > >> >> to the kexec on panic path or if it is just running at a different > >> >> location that is the problem. > >> > > > > > I think I have found the 'something silly'. Here is a patch that allows > > our Dell em64t boxes to boot. This change matches the original code. The > > main difference that caused the problems was the setting of _PAGE_NX bit. > > This caused issues in early_io_remap(). > > > > Thanks to Larry Woodman for debugging this. > > This looks like a different one but looks fairly sane. > > Do you know what code had problems having _PAGE_NX set. > What are we doing with early_ioremap the requires execute > permissions. It doesn't sound right that we would need > this. This fix is only needed for a subset of our em64t boxes, so it could be just a chipset problem. Supposedly, if I remember the conversation correctly, when the kernel first boots it reserves about 40MB and about 20 pmds automatically. After decompression, early_io_remap tries to setup all the memory. The conflict arose when early_io_remap tried to reuse one of those pmds. This caused the system to crash and reboot. I'll try to get more info Monday on the specifics. Cheers, Don > > Eric ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-12 15:25 ` Don Zickus @ 2006-08-12 19:41 ` Eric W. Biederman 0 siblings, 0 replies; 47+ messages in thread From: Eric W. Biederman @ 2006-08-12 19:41 UTC (permalink / raw) To: Don Zickus Cc: vgoyal, fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel Don Zickus <dzickus@redhat.com> writes: >> This looks like a different one but looks fairly sane. >> >> Do you know what code had problems having _PAGE_NX set. >> What are we doing with early_ioremap the requires execute >> permissions. It doesn't sound right that we would need >> this. > > This fix is only needed for a subset of our em64t boxes, so it could be > just a chipset problem. Supposedly, if I remember the conversation > correctly, when the kernel first boots it reserves about 40MB and about 20 > pmds automatically. After decompression, early_io_remap tries to setup > all the memory. The conflict arose when early_io_remap tried to reuse one > of those pmds. This caused the system to crash and reboot. > > I'll try to get more info Monday on the specifics. Thanks. Eric ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-12 7:20 ` Eric W. Biederman 2006-08-12 15:25 ` Don Zickus @ 2006-08-13 20:06 ` Andi Kleen 2006-08-13 21:44 ` Eric W. Biederman 1 sibling, 1 reply; 47+ messages in thread From: Andi Kleen @ 2006-08-13 20:06 UTC (permalink / raw) To: Eric W. Biederman Cc: fastboot, Jan Kratochvil, Horms, H. Peter Anvin, Magnus Damm, linux-kernel, dzickus ebiederm@xmission.com (Eric W. Biederman) writes: > > Do you know what code had problems having _PAGE_NX set. > What are we doing with early_ioremap the requires execute > permissions. It doesn't sound right that we would need > this. The early EM64T CPUs didn't support NX and would GPF when they hit the bit. That is why you always need to mask with __supported_pte_mask when using _PAGE_NX. -Andi ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-13 20:06 ` Andi Kleen @ 2006-08-13 21:44 ` Eric W. Biederman 0 siblings, 0 replies; 47+ messages in thread From: Eric W. Biederman @ 2006-08-13 21:44 UTC (permalink / raw) To: Andi Kleen Cc: fastboot, Jan Kratochvil, Horms, H. Peter Anvin, Magnus Damm, linux-kernel, dzickus Andi Kleen <ak@suse.de> writes: > ebiederm@xmission.com (Eric W. Biederman) writes: >> >> Do you know what code had problems having _PAGE_NX set. >> What are we doing with early_ioremap the requires execute >> permissions. It doesn't sound right that we would need >> this. > > The early EM64T CPUs didn't support NX and would GPF when > they hit the bit. That is why you always need to mask > with __supported_pte_mask when using _PAGE_NX. Ok. Thanks. That explains that it. The NX bit itself causes the GPF not someone trying to execute data on a page. Eric ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-10 20:09 ` Eric W. Biederman 2006-08-11 21:25 ` Don Zickus @ 2006-08-14 16:51 ` Vivek Goyal 2006-08-14 17:04 ` H. Peter Anvin 1 sibling, 1 reply; 47+ messages in thread From: Vivek Goyal @ 2006-08-14 16:51 UTC (permalink / raw) To: Eric W. Biederman Cc: Don Zickus, fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2126 bytes --] On Thu, Aug 10, 2006 at 02:09:58PM -0600, Eric W. Biederman wrote: > > I just reserved memory at non 2MB aligned location 65MB@15MB so that > > kernel is loaded at 16MB and other smaller segments below the compressed > > image, then I can successfully booted into the kdump kernel. > > :) > > > So basically kexec on panic path seems to be clean except stomping issue. > > May be bzImage program header should reflect right "MemSize" which > > takes into account extra memory space calculations. > > Yes. That sounds like the right thing to do. > > I remember trying to compute a good memsize when I created the bzImage > header but it is completely possible I missed some part of the > calculation or assumed that the kernels .bss section would always be > larger than what I needed for decompression. > Hi Eric, Please find a patch attached to fix the issue. I have added few things which might be consuming memory beyond "MemSize" as described in misc.c file. Regarding decompressor code using kernel .bss section area, I think that might not be possible as kernel .bss is part of raw binary being generated. (vmlinux.bin). So effectively it becomes part of input data and output compressed data (vmlinux.bin.gz). I think generally objcopy does not output bss section in the raw binary but in kernel case .bss is somewhere in the middle of the final image and not at the end, and that could be the reason that objcopy is oututting bss also in raw binary image. In case of second objcopy while we are generating vmlinux.bin from compressed kernel vmlinux (vmlinux containing decompressor code), bss section does not seem to be part of outputted raw binary. That's the reason I had to pass another argument to tools/build.c to determine exact memory requirements of compressed vmlinux. So the decompressor can not use kernel's .bss for its execution. So we should be taking decompressor's memory requirements into account while calculating "MemSize", irrespective of kernel's .bss size? Am I missing something? If this seems reasonable, then i can roll out similar patch for i386 too. Thanks & Regards Vivek [-- Attachment #2: x86_64-bzImage-mem-size-adjustment-fix.patch --] [-- Type: text/plain, Size: 9903 bytes --] o Kdump on x86_64 fails as at run time bzImage decompression is consuming more memory and stomps over some of the data loaded by kexec immediately after bzImage. o How much memory bzImage will effectively consume at load time is exported through "MemSize" field of bzImage program headers. o This patch does more adjustments to while calculating the load time memory requirements of bzImage, which gives loader a clue about where it is safe to load some other data. o Following are some adjustments. - Add memory consumed by decompressor code. (code+data+bss...etc). - Adjust the meory required for safe decompression. (refer misc.c) - Take into account the HEAP memory used by decompressor code. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> --- arch/x86_64/boot/Makefile | 3 arch/x86_64/boot/compressed/vmlinux.lds | 2 arch/x86_64/boot/tools/build.c | 129 ++++++++++++++++++++------------ 3 files changed, 87 insertions(+), 47 deletions(-) diff -puN arch/x86_64/boot/tools/build.c~x86_64-bzImage-mem-size-adjustment-fix arch/x86_64/boot/tools/build.c --- linux-2.6.18-rc3-1M/arch/x86_64/boot/tools/build.c~x86_64-bzImage-mem-size-adjustment-fix 2006-08-10 20:05:10.000000000 -0400 +++ linux-2.6.18-rc3-1M-root/arch/x86_64/boot/tools/build.c 2006-08-11 01:45:59.000000000 -0400 @@ -54,8 +54,13 @@ int fd; int is_big_kernel; #define MAX_PHDRS 100 -static Elf64_Ehdr ehdr; -static Elf64_Phdr phdr[MAX_PHDRS]; +/* Uncompressed kernel vmlinux. */ +static Elf64_Ehdr vmlinux_ehdr; +static Elf64_Phdr vmlinux_phdr[MAX_PHDRS]; + +/* Compressed kernel vmlinux (With decompressor code attached)*/ +static Elf64_Ehdr cvmlinux_ehdr; +static Elf64_Phdr cvmlinux_phdr[MAX_PHDRS]; void die(const char * str, ...) { @@ -98,80 +103,80 @@ void file_open(const char *name) die("Unable to open `%s': %m", name); } -static void read_ehdr(void) +static void read_ehdr(Elf64_Ehdr *ehdr) { - if (read(fd, &ehdr, sizeof(ehdr)) != sizeof(ehdr)) { + if (read(fd, ehdr, sizeof(*ehdr)) != sizeof(*ehdr)) { die("Cannot read ELF header: %s\n", strerror(errno)); } - if (memcmp(ehdr.e_ident, ELFMAG, 4) != 0) { + if (memcmp(ehdr->e_ident, ELFMAG, 4) != 0) { die("No ELF magic\n"); } - if (ehdr.e_ident[EI_CLASS] != ELFCLASS64) { + if (ehdr->e_ident[EI_CLASS] != ELFCLASS64) { die("Not a 64 bit executable\n"); } - if (ehdr.e_ident[EI_DATA] != ELFDATA2LSB) { + if (ehdr->e_ident[EI_DATA] != ELFDATA2LSB) { die("Not a LSB ELF executable\n"); } - if (ehdr.e_ident[EI_VERSION] != EV_CURRENT) { + if (ehdr->e_ident[EI_VERSION] != EV_CURRENT) { die("Unknown ELF version\n"); } /* Convert the fields to native endian */ - ehdr.e_type = elf16_to_cpu(ehdr.e_type); - ehdr.e_machine = elf16_to_cpu(ehdr.e_machine); - ehdr.e_version = elf32_to_cpu(ehdr.e_version); - ehdr.e_entry = elf64_to_cpu(ehdr.e_entry); - ehdr.e_phoff = elf64_to_cpu(ehdr.e_phoff); - ehdr.e_shoff = elf64_to_cpu(ehdr.e_shoff); - ehdr.e_flags = elf32_to_cpu(ehdr.e_flags); - ehdr.e_ehsize = elf16_to_cpu(ehdr.e_ehsize); - ehdr.e_phentsize = elf16_to_cpu(ehdr.e_phentsize); - ehdr.e_phnum = elf16_to_cpu(ehdr.e_phnum); - ehdr.e_shentsize = elf16_to_cpu(ehdr.e_shentsize); - ehdr.e_shnum = elf16_to_cpu(ehdr.e_shnum); - ehdr.e_shstrndx = elf16_to_cpu(ehdr.e_shstrndx); + ehdr->e_type = elf16_to_cpu(ehdr->e_type); + ehdr->e_machine = elf16_to_cpu(ehdr->e_machine); + ehdr->e_version = elf32_to_cpu(ehdr->e_version); + ehdr->e_entry = elf64_to_cpu(ehdr->e_entry); + ehdr->e_phoff = elf64_to_cpu(ehdr->e_phoff); + ehdr->e_shoff = elf64_to_cpu(ehdr->e_shoff); + ehdr->e_flags = elf32_to_cpu(ehdr->e_flags); + ehdr->e_ehsize = elf16_to_cpu(ehdr->e_ehsize); + ehdr->e_phentsize = elf16_to_cpu(ehdr->e_phentsize); + ehdr->e_phnum = elf16_to_cpu(ehdr->e_phnum); + ehdr->e_shentsize = elf16_to_cpu(ehdr->e_shentsize); + ehdr->e_shnum = elf16_to_cpu(ehdr->e_shnum); + ehdr->e_shstrndx = elf16_to_cpu(ehdr->e_shstrndx); - if ((ehdr.e_type != ET_EXEC) && (ehdr.e_type != ET_DYN)) { + if ((ehdr->e_type != ET_EXEC) && (ehdr->e_type != ET_DYN)) { die("Unsupported ELF header type\n"); } - if (ehdr.e_machine != EM_X86_64) { + if (ehdr->e_machine != EM_X86_64) { die("Not for x86_64\n"); } - if (ehdr.e_version != EV_CURRENT) { + if (ehdr->e_version != EV_CURRENT) { die("Unknown ELF version\n"); } - if (ehdr.e_ehsize != sizeof(Elf64_Ehdr)) { + if (ehdr->e_ehsize != sizeof(Elf64_Ehdr)) { die("Bad Elf header size\n"); } - if (ehdr.e_phentsize != sizeof(Elf64_Phdr)) { + if (ehdr->e_phentsize != sizeof(Elf64_Phdr)) { die("Bad program header entry\n"); } - if (ehdr.e_shentsize != sizeof(Elf64_Shdr)) { + if (ehdr->e_shentsize != sizeof(Elf64_Shdr)) { die("Bad section header entry\n"); } - if (ehdr.e_shstrndx >= ehdr.e_shnum) { + if (ehdr->e_shstrndx >= ehdr->e_shnum) { die("String table index out of bounds\n"); } } -static void read_phds(void) +static void read_phdrs(Elf64_Ehdr *ehdr, Elf64_Phdr *phdr) { int i; size_t size; - if (ehdr.e_phnum > MAX_PHDRS) { + if (ehdr->e_phnum > MAX_PHDRS) { die("%d program headers supported: %d\n", - ehdr.e_phnum, MAX_PHDRS); + ehdr->e_phnum, MAX_PHDRS); } - if (lseek(fd, ehdr.e_phoff, SEEK_SET) < 0) { + if (lseek(fd, ehdr->e_phoff, SEEK_SET) < 0) { die("Seek to %d failed: %s\n", - ehdr.e_phoff, strerror(errno)); + ehdr->e_phoff, strerror(errno)); } - size = sizeof(phdr[0])*ehdr.e_phnum; - if (read(fd, &phdr, size) != size) { - die("Cannot read ELF section headers: %s\n", + size = (sizeof(*phdr))*(ehdr->e_phnum); + if (read(fd, phdr, size) != size) { + die("Cannot read ELF program headers: %s\n", strerror(errno)); } - for(i = 0; i < ehdr.e_phnum; i++) { + for(i = 0; i < ehdr->e_phnum; i++) { phdr[i].p_type = elf32_to_cpu(phdr[i].p_type); phdr[i].p_flags = elf32_to_cpu(phdr[i].p_flags); phdr[i].p_offset = elf64_to_cpu(phdr[i].p_offset); @@ -183,13 +188,13 @@ static void read_phds(void) } } -uint64_t vmlinux_memsz(void) +uint64_t elf_exec_memsz(Elf64_Ehdr *ehdr, Elf64_Phdr *phdr) { uint64_t min, max, size; int i; max = 0; min = ~max; - for(i = 0; i < ehdr.e_phnum; i++) { + for(i = 0; i < ehdr->e_phnum; i++) { uint64_t start, end; if (phdr[i].p_type != PT_LOAD) continue; @@ -200,31 +205,32 @@ uint64_t vmlinux_memsz(void) if (end > max) max = end; } - /* Get the reported size by vmlinux */ + /* Get the reported size by elf exec */ size = max - min; return size; } void usage(void) { - die("Usage: build [-b] bootsect setup system rootdev vmlinux [> image]"); + die("Usage: build [-b] bootsect setup system rootdev vmlinux vmlinux.bin.gz <vmlinux with decompressor code>[> image]"); } int main(int argc, char ** argv) { unsigned int i, sz, setup_sectors; uint64_t kernel_offset, kernel_filesz, kernel_memsz; + uint64_t vmlinux_memsz, cvmlinux_memsz, vmlinux_gz_size; int c; u32 sys_size; byte major_root, minor_root; - struct stat sb; + struct stat sb, vmlinux_gz_sb; if (argc > 2 && !strcmp(argv[1], "-b")) { is_big_kernel = 1; argc--, argv++; } - if (argc != 6) + if (argc != 8) usage(); if (!strcmp(argv[4], "CURRENT")) { if (stat("/", &sb)) { @@ -307,11 +313,42 @@ int main(int argc, char ** argv) } close(fd); + /* Open uncompressed vmlinux. */ file_open(argv[5]); - read_ehdr(); - read_phds(); + read_ehdr(&vmlinux_ehdr); + read_phdrs(&vmlinux_ehdr, vmlinux_phdr); close(fd); - kernel_memsz = vmlinux_memsz(); + vmlinux_memsz = elf_exec_memsz(&vmlinux_ehdr, vmlinux_phdr); + + /* Process vmlinux.bin.gz */ + file_open(argv[6]); + if (fstat (fd, &vmlinux_gz_sb)) + die("Unable to stat `%s': %m", argv[6]); + close(fd); + vmlinux_gz_size = vmlinux_gz_sb.st_size; + + /* Process compressed vmlinux (compressed vmlinux + decompressor) */ + file_open(argv[7]); + read_ehdr(&cvmlinux_ehdr); + read_phdrs(&cvmlinux_ehdr, cvmlinux_phdr); + close(fd); + cvmlinux_memsz = elf_exec_memsz(&cvmlinux_ehdr, cvmlinux_phdr); + + kernel_memsz = vmlinux_memsz; + + /* Add decompressor code size */ + kernel_memsz += cvmlinux_memsz - vmlinux_gz_size; + + /* Refer arch/x86_64/boot/compressed/misc.c for following adj. + * Add 8 bytes for every 32K input block + */ + kernel_memsz += vmlinux_memsz >> 12; + + /* Add 32K + 18 bytes of extra slack */ + kernel_memsz = kernel_memsz + (32768 + 18); + + /* Align on a 4K boundary. */ + kernel_memsz = (kernel_memsz + 4095) & (~4095); if (lseek(1, 88, SEEK_SET) != 88) /* Write sizes to the bootsector */ die("Output: seek failed"); diff -puN arch/x86_64/boot/Makefile~x86_64-bzImage-mem-size-adjustment-fix arch/x86_64/boot/Makefile --- linux-2.6.18-rc3-1M/arch/x86_64/boot/Makefile~x86_64-bzImage-mem-size-adjustment-fix 2006-08-11 00:53:32.000000000 -0400 +++ linux-2.6.18-rc3-1M-root/arch/x86_64/boot/Makefile 2006-08-11 00:56:27.000000000 -0400 @@ -41,7 +41,8 @@ $(obj)/bzImage: BUILDFLAGS := -b quiet_cmd_image = BUILD $@ cmd_image = $(obj)/tools/build $(BUILDFLAGS) $(obj)/bootsect $(obj)/setup \ - $(obj)/vmlinux.bin $(ROOT_DEV) vmlinux > $@ + $(obj)/vmlinux.bin $(ROOT_DEV) vmlinux \ + $(obj)/compressed/vmlinux.bin.gz $(obj)/compressed/vmlinux > $@ $(obj)/bzImage: $(obj)/bootsect $(obj)/setup \ $(obj)/vmlinux.bin $(obj)/tools/build FORCE diff -puN arch/x86_64/boot/compressed/vmlinux.lds~x86_64-bzImage-mem-size-adjustment-fix arch/x86_64/boot/compressed/vmlinux.lds --- linux-2.6.18-rc3-1M/arch/x86_64/boot/compressed/vmlinux.lds~x86_64-bzImage-mem-size-adjustment-fix 2006-08-11 01:29:52.000000000 -0400 +++ linux-2.6.18-rc3-1M-root/arch/x86_64/boot/compressed/vmlinux.lds 2006-08-11 01:32:00.000000000 -0400 @@ -40,5 +40,7 @@ SECTIONS pgtable = . ; . = . + 4096 * 6; _heap = .; + . = . + 0x6000; /* misc.c, Heap size. */ + _heap_end = .; } } _ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-14 16:51 ` [Fastboot] " Vivek Goyal @ 2006-08-14 17:04 ` H. Peter Anvin 2006-08-14 18:11 ` Vivek Goyal 2006-08-14 20:00 ` Eric W. Biederman 0 siblings, 2 replies; 47+ messages in thread From: H. Peter Anvin @ 2006-08-14 17:04 UTC (permalink / raw) To: vgoyal Cc: Eric W. Biederman, Don Zickus, fastboot, Horms, Jan Kratochvil, Magnus Damm, linux-kernel Vivek Goyal wrote: > On Thu, Aug 10, 2006 at 02:09:58PM -0600, Eric W. Biederman wrote: >>> I just reserved memory at non 2MB aligned location 65MB@15MB so that >>> kernel is loaded at 16MB and other smaller segments below the compressed >>> image, then I can successfully booted into the kdump kernel. >> :) >> >>> So basically kexec on panic path seems to be clean except stomping issue. >>> May be bzImage program header should reflect right "MemSize" which >>> takes into account extra memory space calculations. >> Yes. That sounds like the right thing to do. >> >> I remember trying to compute a good memsize when I created the bzImage >> header but it is completely possible I missed some part of the >> calculation or assumed that the kernels .bss section would always be >> larger than what I needed for decompression. >> Could someone please describe the intended semantics of this MemSize header, *and* its intended usage? -hpa ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-14 17:04 ` H. Peter Anvin @ 2006-08-14 18:11 ` Vivek Goyal 2006-08-14 19:32 ` H. Peter Anvin 2006-08-14 20:00 ` Eric W. Biederman 1 sibling, 1 reply; 47+ messages in thread From: Vivek Goyal @ 2006-08-14 18:11 UTC (permalink / raw) To: H. Peter Anvin Cc: Eric W. Biederman, Don Zickus, fastboot, Horms, Jan Kratochvil, Magnus Damm, linux-kernel On Mon, Aug 14, 2006 at 10:04:29AM -0700, H. Peter Anvin wrote: > Vivek Goyal wrote: > >On Thu, Aug 10, 2006 at 02:09:58PM -0600, Eric W. Biederman wrote: > >>>I just reserved memory at non 2MB aligned location 65MB@15MB so that > >>>kernel is loaded at 16MB and other smaller segments below the compressed > >>>image, then I can successfully booted into the kdump kernel. > >>:) > >> > >>>So basically kexec on panic path seems to be clean except stomping issue. > >>>May be bzImage program header should reflect right "MemSize" which > >>>takes into account extra memory space calculations. > >>Yes. That sounds like the right thing to do. > >> > >>I remember trying to compute a good memsize when I created the bzImage > >>header but it is completely possible I missed some part of the > >>calculation or assumed that the kernels .bss section would always be > >>larger than what I needed for decompression. > >> > > Could someone please describe the intended semantics of this MemSize > header, *and* its intended usage? > Now and ELF header(attached to bzImage) is being used to describe the kernel executable. One program header of PT_LOAD type is being created. The "p_filesz" field of program header is basically describing the vmlinux file size and "p_memsz" is giving how much memory will be consumed by kernel image at load time. Ideally "p_memsz" should be "p_memsz" summation of all the program headers of vmlinux file but I guess in this case we are stretching the ELF specification a little bit and also taking into the account the additional memory which will be used by decompressor and decompression logic by the time execution is transferred to the actual kernel. The intended usage is currently kexec/kdump. While pre-loading a kernel in memory, kexec creates multiple segments and puts various data into it. (like kernel image, initrd, parameters etc.) Kexec needs to know how much memory is being used by the loaded kernel so that it can place another segment after kernel at a safe distance. By reading "p_memsz" from ELF header, kexec can determine it. Currently problem we are facing in kdump case is that parameter segment (command line and other bootloader parameters) is being placed immediately after kernel which gets stomped over by decompressor code and kernel boot fails. Normal boot never faces this problem as parameter segment is always loaded below where kernel image is loaded. Thanks Vivek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-14 18:11 ` Vivek Goyal @ 2006-08-14 19:32 ` H. Peter Anvin 2006-08-14 19:42 ` Vivek Goyal 0 siblings, 1 reply; 47+ messages in thread From: H. Peter Anvin @ 2006-08-14 19:32 UTC (permalink / raw) To: vgoyal Cc: Eric W. Biederman, Don Zickus, fastboot, Horms, Jan Kratochvil, Magnus Damm, linux-kernel Vivek Goyal wrote: > On Mon, Aug 14, 2006 at 10:04:29AM -0700, H. Peter Anvin wrote: >> Vivek Goyal wrote: >>> On Thu, Aug 10, 2006 at 02:09:58PM -0600, Eric W. Biederman wrote: >>>>> I just reserved memory at non 2MB aligned location 65MB@15MB so that >>>>> kernel is loaded at 16MB and other smaller segments below the compressed >>>>> image, then I can successfully booted into the kdump kernel. >>>> :) >>>> >>>>> So basically kexec on panic path seems to be clean except stomping issue. >>>>> May be bzImage program header should reflect right "MemSize" which >>>>> takes into account extra memory space calculations. >>>> Yes. That sounds like the right thing to do. >>>> >>>> I remember trying to compute a good memsize when I created the bzImage >>>> header but it is completely possible I missed some part of the >>>> calculation or assumed that the kernels .bss section would always be >>>> larger than what I needed for decompression. >>>> >> Could someone please describe the intended semantics of this MemSize >> header, *and* its intended usage? >> > > Now and ELF header(attached to bzImage) is being used to describe > the kernel executable. One program header of PT_LOAD type is being > created. The "p_filesz" field of program header is basically > describing the vmlinux file size and "p_memsz" is giving how > much memory will be consumed by kernel image at load time. > > Ideally "p_memsz" should be "p_memsz" summation of all the program > headers of vmlinux file but I guess in this case we are stretching the > ELF specification a little bit and also taking into the account the > additional memory which will be used by decompressor and decompression > logic by the time execution is transferred to the actual kernel. > What about once the kernel is booted? -hpa ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-14 19:32 ` H. Peter Anvin @ 2006-08-14 19:42 ` Vivek Goyal 2006-08-14 19:45 ` H. Peter Anvin 0 siblings, 1 reply; 47+ messages in thread From: Vivek Goyal @ 2006-08-14 19:42 UTC (permalink / raw) To: H. Peter Anvin Cc: Eric W. Biederman, Don Zickus, fastboot, Horms, Jan Kratochvil, Magnus Damm, linux-kernel On Mon, Aug 14, 2006 at 12:32:32PM -0700, H. Peter Anvin wrote: > Vivek Goyal wrote: > >On Mon, Aug 14, 2006 at 10:04:29AM -0700, H. Peter Anvin wrote: > >>Vivek Goyal wrote: > >>>On Thu, Aug 10, 2006 at 02:09:58PM -0600, Eric W. Biederman wrote: > >>>>>I just reserved memory at non 2MB aligned location 65MB@15MB so that > >>>>>kernel is loaded at 16MB and other smaller segments below the > >>>>>compressed > >>>>>image, then I can successfully booted into the kdump kernel. > >>>>:) > >>>> > >>>>>So basically kexec on panic path seems to be clean except stomping > >>>>>issue. > >>>>>May be bzImage program header should reflect right "MemSize" which > >>>>>takes into account extra memory space calculations. > >>>>Yes. That sounds like the right thing to do. > >>>> > >>>>I remember trying to compute a good memsize when I created the bzImage > >>>>header but it is completely possible I missed some part of the > >>>>calculation or assumed that the kernels .bss section would always be > >>>>larger than what I needed for decompression. > >>>> > >>Could someone please describe the intended semantics of this MemSize > >>header, *and* its intended usage? > >> > > > >Now and ELF header(attached to bzImage) is being used to describe > >the kernel executable. One program header of PT_LOAD type is being > >created. The "p_filesz" field of program header is basically > >describing the vmlinux file size and "p_memsz" is giving how > >much memory will be consumed by kernel image at load time. > > > >Ideally "p_memsz" should be "p_memsz" summation of all the program > >headers of vmlinux file but I guess in this case we are stretching the > >ELF specification a little bit and also taking into the account the > >additional memory which will be used by decompressor and decompression > >logic by the time execution is transferred to the actual kernel. > > > > What about once the kernel is booted? > Sorry did not understand the question. Few more lines will help. Thanks Vivek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-14 19:42 ` Vivek Goyal @ 2006-08-14 19:45 ` H. Peter Anvin 2006-08-14 19:57 ` Vivek Goyal 2006-08-14 20:10 ` Eric W. Biederman 0 siblings, 2 replies; 47+ messages in thread From: H. Peter Anvin @ 2006-08-14 19:45 UTC (permalink / raw) To: vgoyal Cc: Eric W. Biederman, Don Zickus, fastboot, Horms, Jan Kratochvil, Magnus Damm, linux-kernel Vivek Goyal wrote: >>> >> What about once the kernel is booted? > > Sorry did not understand the question. Few more lines will help. > Is this field intended to protect any kind of memory during the early boot phase of the kernel proper, or only the decompressor? -hpa ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-14 19:45 ` H. Peter Anvin @ 2006-08-14 19:57 ` Vivek Goyal 2006-08-14 20:10 ` Eric W. Biederman 1 sibling, 0 replies; 47+ messages in thread From: Vivek Goyal @ 2006-08-14 19:57 UTC (permalink / raw) To: H. Peter Anvin Cc: Eric W. Biederman, Don Zickus, fastboot, Horms, Jan Kratochvil, Magnus Damm, linux-kernel On Mon, Aug 14, 2006 at 12:45:31PM -0700, H. Peter Anvin wrote: > Vivek Goyal wrote: > >>> > >>What about once the kernel is booted? > > > >Sorry did not understand the question. Few more lines will help. > > > > Is this field intended to protect any kind of memory during the early > boot phase of the kernel proper, or only the decompressor? > I think it should protect against any dynamic memory usage during early boot phase too till we reach a point where kernel is aware of BIOS provided memory maps and kernel memory area usage can be controlled with the help of BIOS provided/User defined memory maps. In i386 implementation Eric is alredy taking into account the memory used by bootmem bitmap and initial page tables. I have not looked into x86_64 kernel code whether do I need to make such adjustments. It worked for me so did not bother much. I will look into it. Thanks Vivek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-14 19:45 ` H. Peter Anvin 2006-08-14 19:57 ` Vivek Goyal @ 2006-08-14 20:10 ` Eric W. Biederman 2006-08-14 20:59 ` Vivek Goyal 1 sibling, 1 reply; 47+ messages in thread From: Eric W. Biederman @ 2006-08-14 20:10 UTC (permalink / raw) To: H. Peter Anvin Cc: vgoyal, Don Zickus, fastboot, Horms, Jan Kratochvil, Magnus Damm, linux-kernel "H. Peter Anvin" <hpa@zytor.com> writes: > Vivek Goyal wrote: >>>> >>> What about once the kernel is booted? >> Sorry did not understand the question. Few more lines will help. >> > > Is this field intended to protect any kind of memory during the early boot phase > of the kernel proper, or only the decompressor? Yes, the field should account for memory usage until the kernel starts doing the accounting at run time. I'm actually surprised that taking into account the .bss was not enough to cover up anything the decompressor was doing. Usually the kernel's .bss is more than the extra 32K or so that the decompressor uses. Eric ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-14 20:10 ` Eric W. Biederman @ 2006-08-14 20:59 ` Vivek Goyal 2006-08-14 21:15 ` Eric W. Biederman 0 siblings, 1 reply; 47+ messages in thread From: Vivek Goyal @ 2006-08-14 20:59 UTC (permalink / raw) To: Eric W. Biederman Cc: H. Peter Anvin, Don Zickus, fastboot, Horms, Jan Kratochvil, Magnus Damm, linux-kernel On Mon, Aug 14, 2006 at 02:10:51PM -0600, Eric W. Biederman wrote: > "H. Peter Anvin" <hpa@zytor.com> writes: > > > Vivek Goyal wrote: > >>>> > >>> What about once the kernel is booted? > >> Sorry did not understand the question. Few more lines will help. > >> > > > > Is this field intended to protect any kind of memory during the early boot phase > > of the kernel proper, or only the decompressor? > > Yes, the field should account for memory usage until the kernel starts > doing the accounting at run time. > > I'm actually surprised that taking into account the .bss was not enough to > cover up anything the decompressor was doing. Usually the kernel's .bss > is more than the extra 32K or so that the decompressor uses. > I think .bss section size will act as a buffer for decompressor only if .bss is not part of compressed data hence decompressor does not have to move beyond bss and it can run very well from kernel bss space. But somehow on my machine, it looks like that bss is very much part of raw binary image hence part of compressed data (vmlinux.bin.gz). memsz exported in bzImage is same as size of raw output binary. Probably that's the reason that we are stomping other segments in my case and if my understanding is right then it should happen irrespective of kernel bss size. Here I am pasting how kernel vmlinux file program headers look like. .bss is mapped by first program header along with .text. Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align LOAD 0x0000000000200000 0xffffffff80000000 0x0000000000000000 0x0000000000546bf8 0x00000000005dbc28 RWE 200000 LOAD 0x00000000007dc000 0xffffffff805dc000 0x00000000005dc000 0x000000000000ede0 0x000000000000ede0 RW 200000 LOAD 0x0000000000800000 0xffffffffff600000 0x00000000005eb000 0x0000000000000c08 0x0000000000000c08 RWE 200000 LOAD 0x00000000009ec000 0xffffffff805ec000 0x00000000005ec000 0x0000000000044004 0x0000000000044004 RWE 200000 GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 RWE 8 Section to Segment mapping: Segment Sections... 00 .text __ex_table .rodata .pci_fixup __ksymtab __ksymtab_gpl __ksymtab_unused __ksymtab_gpl_future __ksymtab_strings __param .eh_frame .data .bss 01 .data.cacheline_aligned .data.read_mostly 02 .vsyscall_0 .xtime_lock .vxtime .wall_jiffies .sys_tz .sysctl_vsyscall .xtime .jiffies .vsyscall_1 .vsyscall_2 .vsyscall_3 03 .data.init_task .data.page_aligned .smp_altinstructions .smp_locks .smp_altinstr_replacement .init.text .init.data .init.setup .initcall.init .con_initcall.init .altinstructions .altinstr_replacement .exit.text .init.ramfs .data.percpu .data_nosave 04 Thanks Vivek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-14 20:59 ` Vivek Goyal @ 2006-08-14 21:15 ` Eric W. Biederman 0 siblings, 0 replies; 47+ messages in thread From: Eric W. Biederman @ 2006-08-14 21:15 UTC (permalink / raw) To: vgoyal Cc: H. Peter Anvin, Don Zickus, fastboot, Horms, Jan Kratochvil, Magnus Damm, linux-kernel Vivek Goyal <vgoyal@in.ibm.com> writes: > On Mon, Aug 14, 2006 at 02:10:51PM -0600, Eric W. Biederman wrote: >> "H. Peter Anvin" <hpa@zytor.com> writes: >> >> > Vivek Goyal wrote: >> >>>> >> >>> What about once the kernel is booted? >> >> Sorry did not understand the question. Few more lines will help. >> >> >> > >> > Is this field intended to protect any kind of memory during the early boot > phase >> > of the kernel proper, or only the decompressor? >> >> Yes, the field should account for memory usage until the kernel starts >> doing the accounting at run time. >> >> I'm actually surprised that taking into account the .bss was not enough to >> cover up anything the decompressor was doing. Usually the kernel's .bss >> is more than the extra 32K or so that the decompressor uses. >> > > I think .bss section size will act as a buffer for decompressor only if > .bss is not part of compressed data hence decompressor does not have to > move beyond bss and it can run very well from kernel bss space. Agreed. > But somehow on my machine, it looks like that bss is very much part > of raw binary image hence part of compressed data (vmlinux.bin.gz). > memsz exported in bzImage is same as size of raw output binary. > > Probably that's the reason that we are stomping other segments in my > case and if my understanding is right then it should happen irrespective > of kernel bss size. > > Here I am pasting how kernel vmlinux file program headers look like. > .bss is mapped by first program header along with .text. Ok. So somehow we have done the insane thing of putting .bss in the middle of the executable. It might even be sane if it is just the .init sections we put after it, but no we are putting .data after the .bss. Well that easily explains why we had a problem. Getting the proper accounting in for handling this case is probably reasonable. It probably also makes sense for someone to take a good hard look at the crazy ordering of sections on x86_64. Eric ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-14 17:04 ` H. Peter Anvin 2006-08-14 18:11 ` Vivek Goyal @ 2006-08-14 20:00 ` Eric W. Biederman 1 sibling, 0 replies; 47+ messages in thread From: Eric W. Biederman @ 2006-08-14 20:00 UTC (permalink / raw) To: H. Peter Anvin Cc: vgoyal, Don Zickus, fastboot, Horms, Jan Kratochvil, Magnus Damm, linux-kernel "H. Peter Anvin" <hpa@zytor.com> writes: > Vivek Goyal wrote: >> On Thu, Aug 10, 2006 at 02:09:58PM -0600, Eric W. Biederman wrote: >>>> I just reserved memory at non 2MB aligned location 65MB@15MB so that >>>> kernel is loaded at 16MB and other smaller segments below the compressed >>>> image, then I can successfully booted into the kdump kernel. >>> :) >>> >>>> So basically kexec on panic path seems to be clean except stomping issue. >>>> May be bzImage program header should reflect right "MemSize" which >>>> takes into account extra memory space calculations. >>> Yes. That sounds like the right thing to do. >>> >>> I remember trying to compute a good memsize when I created the bzImage >>> header but it is completely possible I missed some part of the >>> calculation or assumed that the kernels .bss section would always be >>> larger than what I needed for decompression. >>> > > Could someone please describe the intended semantics of this MemSize header, > *and* its intended usage? I think Vivek did a decent job. But here is my take. Currently the ELF header we prepend to the linux kernel have exactly one segment. A segment has several file offset, fields alignment, type, physical address, virtual address, file size, and memory size. The file size parameter describes how much data to pull off of the disk. The memory size describes how much room the segment will consume in memory. The difference between file size and memory size is treated as bss data. Memory size must always be bigger than file size. In the case of the kernel there is a certain amount of memory that the kernel uses before it starts reserving things and using the memory map. The memory that the kernel unconditionally uses should be described with the memsize parameter. An accurate description allows your initrd and your parameter segment to be placed right up next to your kernel without worry about them being stomped, we already do this on a couple of other architectures, or it allows you to detect that there is not enough room to hold your kernel, initrd and parameters. So since we now have the possibility of describing this accurately I would like to. Although the traditional x86 work around of pushing everything up as far in memory as we can and the kernel can address is potentially still an option. For the kexec on panic case we have a very small reserved chunk of memory (16MB I think is typical right now). The smaller that we can successfully run out of the better. Which makes it easy to hit these kinds of things if we don't have an accurate description of the kernel. Eric ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-08-07 23:57 ` Don Zickus 2006-08-08 5:01 ` Eric W. Biederman @ 2006-08-08 23:36 ` Andi Kleen 1 sibling, 0 replies; 47+ messages in thread From: Andi Kleen @ 2006-08-08 23:36 UTC (permalink / raw) To: Don Zickus Cc: fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel Don Zickus <dzickus@redhat.com> writes: > > > > Odd. I wonder if I'm missing a serializing instruction somewhere, > > to ensure the effects of ``self modifying code'' aren't a problem. > > As I read Intels Documentation if you have a jump before you get > > to the code there shouldn't be a problem. > > > > Still that doesn't really explain bytes_out. > > Sounds nasty. > > So I narrowed down the problem but it isn't obvious to me why this problem > exists. Basically, even though bytes_out is supposed to be initialized to > 0, it becomes -1 before entering decompress_kernel(). Of course, the > fallout is in flush_window() bytes_out wounds up being one less than > outcnt and hence my original problem. > > Any thoughts on how to debug where this could be getting corrupted? Use a simulator (hopefully you can reproduce it in there) like qemu or AMD SimNow and set a watch point on the address? Or try to find someone who has a Intel target probe to help you out. -Andi ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [CFT] ELF Relocatable x86 and x86_64 bzImages 2006-07-31 16:19 ` [CFT] ELF Relocatable x86 and x86_64 bzImages Eric W. Biederman @ 2006-08-25 20:16 ` Vivek Goyal 2006-08-04 21:08 ` Don Zickus 2006-08-25 20:16 ` Vivek Goyal 2 siblings, 0 replies; 47+ messages in thread From: Vivek Goyal @ 2006-08-25 20:16 UTC (permalink / raw) To: Eric W. Biederman Cc: fastboot, Jan Kratochvil, Magnus Damm, Horms, Linda Wang, linux-kernel, H. Peter Anvin, linuxppc64-dev On Mon, Jul 31, 2006 at 10:19:04AM -0600, Eric W. Biederman wrote: > > I have spent some time and have gotten my relocatable kernel patches > working against the latest kernels. I intend to push this upstream > shortly. > > Could all of the people who care take a look and test this out > to make certain that it doesn't just work on my test box? > > My approach is to extend bzImage so that it is an ET_DYN ELF executable > (we have what used to be a bootsector where we can put the header). > Boot loaders are explicitly not expected to process relocations. > > The x86_64 kernel is simply built to live at a fixed virtual address > and the boot page tables are relocated. The i386 kernel is built > to process relocates generated with --embedded-relocs (after vmlinux.lds.S) > has been fixed up to sort out static and dynamic relocations. > > Currently there are 33 patches in my tree to do this. > > The weirdest symptom I have had so far is that page faults did not > trigger the early exception handler on x86_64 (instead I got a reboot). > > The code should be available shortly at: > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-reloc.git#reloc-v2.6.18-rc3 > > If all goes well with the testing I will push the patches to Andrew in the next couple > of days. It breaks powerpc build as poewrpc does not seem to be defining symbol _text which is used by arch independent kallsyms.c. Attached is the one line fix. Thanks Vivek o ppc64 does not seem to be defining symbol _text which is used by kernel/kallsyms.c for relocatable kernel patches. Instead of absolute symbol addresses now it is stored as offset from symbol _text (_text + offset) so that relocations entries for this section are generated, if need be. (currently i386 will be the only user once the relocatable kernel patches are merged). Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> --- arch/powerpc/kernel/vmlinux.lds.S | 1 + 1 file changed, 1 insertion(+) diff -puN arch/powerpc/kernel/vmlinux.lds.S~ppc64-compilation-fix arch/powerpc/kernel/vmlinux.lds.S --- linux-2.6.18-rc3-1M/arch/powerpc/kernel/vmlinux.lds.S~ppc64-compilation-fix 2006-08-24 16:16:17.000000000 -0400 +++ linux-2.6.18-rc3-1M-root/arch/powerpc/kernel/vmlinux.lds.S 2006-08-24 16:26:33.000000000 -0400 @@ -33,6 +33,7 @@ SECTIONS /* Text and gots */ .text : { + _text = .; *(.text .text.*) SCHED_TEXT LOCK_TEXT _ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [CFT] ELF Relocatable x86 and x86_64 bzImages @ 2006-08-25 20:16 ` Vivek Goyal 0 siblings, 0 replies; 47+ messages in thread From: Vivek Goyal @ 2006-08-25 20:16 UTC (permalink / raw) To: Eric W. Biederman Cc: fastboot, Jan Kratochvil, linux-kernel, Linda Wang, Horms, H. Peter Anvin, linuxppc64-dev On Mon, Jul 31, 2006 at 10:19:04AM -0600, Eric W. Biederman wrote: > > I have spent some time and have gotten my relocatable kernel patches > working against the latest kernels. I intend to push this upstream > shortly. > > Could all of the people who care take a look and test this out > to make certain that it doesn't just work on my test box? > > My approach is to extend bzImage so that it is an ET_DYN ELF executable > (we have what used to be a bootsector where we can put the header). > Boot loaders are explicitly not expected to process relocations. > > The x86_64 kernel is simply built to live at a fixed virtual address > and the boot page tables are relocated. The i386 kernel is built > to process relocates generated with --embedded-relocs (after vmlinux.lds.S) > has been fixed up to sort out static and dynamic relocations. > > Currently there are 33 patches in my tree to do this. > > The weirdest symptom I have had so far is that page faults did not > trigger the early exception handler on x86_64 (instead I got a reboot). > > The code should be available shortly at: > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-reloc.git#reloc-v2.6.18-rc3 > > If all goes well with the testing I will push the patches to Andrew in the next couple > of days. It breaks powerpc build as poewrpc does not seem to be defining symbol _text which is used by arch independent kallsyms.c. Attached is the one line fix. Thanks Vivek o ppc64 does not seem to be defining symbol _text which is used by kernel/kallsyms.c for relocatable kernel patches. Instead of absolute symbol addresses now it is stored as offset from symbol _text (_text + offset) so that relocations entries for this section are generated, if need be. (currently i386 will be the only user once the relocatable kernel patches are merged). Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> --- arch/powerpc/kernel/vmlinux.lds.S | 1 + 1 file changed, 1 insertion(+) diff -puN arch/powerpc/kernel/vmlinux.lds.S~ppc64-compilation-fix arch/powerpc/kernel/vmlinux.lds.S --- linux-2.6.18-rc3-1M/arch/powerpc/kernel/vmlinux.lds.S~ppc64-compilation-fix 2006-08-24 16:16:17.000000000 -0400 +++ linux-2.6.18-rc3-1M-root/arch/powerpc/kernel/vmlinux.lds.S 2006-08-24 16:26:33.000000000 -0400 @@ -33,6 +33,7 @@ SECTIONS /* Text and gots */ .text : { + _text = .; *(.text .text.*) SCHED_TEXT LOCK_TEXT _ ^ permalink raw reply [flat|nested] 47+ messages in thread
[parent not found: <6EIOG-2xY-31@gated-at.bofh.it>]
[parent not found: <6EIOG-2xY-33@gated-at.bofh.it>]
[parent not found: <6EIOG-2xY-35@gated-at.bofh.it>]
[parent not found: <6EIOG-2xY-37@gated-at.bofh.it>]
[parent not found: <6EIOG-2xY-39@gated-at.bofh.it>]
[parent not found: <6EIOG-2xY-19@gated-at.bofh.it>]
[parent not found: <6Gf5M-2zt-23@gated-at.bofh.it>]
[parent not found: <6Gfpt-30C-49@gated-at.bofh.it>]
[parent not found: <6GhAA-6bP-19@gated-at.bofh.it>]
[parent not found: <6Gx2C-436-5@gated-at.bofh.it>]
[parent not found: <6HhoT-5E7-33@gated-at.bofh.it>]
[parent not found: <6HhRQ-6uk-3@gated-at.bofh.it>]
* Re: [Fastboot] [CFT] ELF Relocatable x86 and x86_64 bzImages [not found] ` <6HhRQ-6uk-3@gated-at.bofh.it> @ 2006-08-09 12:40 ` Bodo Eggert 0 siblings, 0 replies; 47+ messages in thread From: Bodo Eggert @ 2006-08-09 12:40 UTC (permalink / raw) To: Eric W. Biederman, Don Zickus, fastboot, Horms, Jan Kratochvil, H. Peter Anvin, Magnus Damm, linux-kernel Eric W. Biederman <ebiederm@xmission.com> wrote: > Odd. I wonder if I'm missing a serializing instruction somewhere, > to ensure the effects of ``self modifying code'' aren't a problem. > As I read Intels Documentation if you have a jump before you get > to the code there shouldn't be a problem. ACK, a short jump to the next instruction *should* be all it takes, but if it doesn't, maybe a long jump will do the trick. -- Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF verbreiteten Lügen zu sabotieren. http://david.woodhou.se/why-not-spf.html ^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2006-08-25 20:17 UTC | newest] Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <aec7e5c30606300145p441d8d0xd89fab5e87de5a22@mail.gmail.com> [not found] ` <20060705222448.GC992@in.ibm.com> [not found] ` <aec7e5c30607051932r49bbcc7eh2c190daa06859dcc@mail.gmail.com> [not found] ` <20060706081520.GB28225@host0.dyn.jankratochvil.net> [not found] ` <aec7e5c30607070147g657d2624qa93a145dd4515484@mail.gmail.com> [not found] ` <20060707133518.GA15810@in.ibm.com> [not found] ` <20060707143519.GB13097@host0.dyn.jankratochvil.net> [not found] ` <20060710233219.GF16215@in.ibm.com> [not found] ` <20060711010815.GB1021@host0.dyn.jankratochvil.net> [not found] ` <m1d5c92yv4.fsf@ebiederm.dsl.xmission.com> 2006-07-31 16:19 ` [CFT] ELF Relocatable x86 and x86_64 bzImages Eric W. Biederman 2006-07-31 20:25 ` Vivek Goyal 2006-07-31 21:00 ` [Fastboot] " Vivek Goyal 2006-08-01 2:31 ` Eric W. Biederman 2006-08-01 2:34 ` H. Peter Anvin 2006-08-01 3:44 ` Eric W. Biederman 2006-08-01 4:25 ` Jan Kratochvil 2006-08-01 9:09 ` Eric W. Biederman 2006-08-01 9:43 ` Jan Kratochvil 2006-08-01 11:28 ` Eric W. Biederman 2006-08-04 21:08 ` Don Zickus 2006-08-04 21:25 ` Eric W. Biederman 2006-08-04 23:43 ` Don Zickus 2006-08-05 7:49 ` Eric W. Biederman 2006-08-05 16:07 ` Eric W. Biederman 2006-08-07 17:44 ` Don Zickus 2006-08-07 18:08 ` Eric W. Biederman 2006-08-07 23:57 ` Don Zickus 2006-08-08 5:01 ` Eric W. Biederman 2006-08-08 19:36 ` Don Zickus 2006-08-09 20:06 ` Don Zickus 2006-08-10 6:09 ` Eric W. Biederman 2006-08-10 13:13 ` Vivek Goyal 2006-08-10 17:05 ` Eric W. Biederman 2006-08-10 18:18 ` Vivek Goyal 2006-08-10 20:09 ` Eric W. Biederman 2006-08-11 21:25 ` Don Zickus 2006-08-12 7:20 ` Eric W. Biederman 2006-08-12 15:25 ` Don Zickus 2006-08-12 19:41 ` Eric W. Biederman 2006-08-13 20:06 ` Andi Kleen 2006-08-13 21:44 ` Eric W. Biederman 2006-08-14 16:51 ` [Fastboot] " Vivek Goyal 2006-08-14 17:04 ` H. Peter Anvin 2006-08-14 18:11 ` Vivek Goyal 2006-08-14 19:32 ` H. Peter Anvin 2006-08-14 19:42 ` Vivek Goyal 2006-08-14 19:45 ` H. Peter Anvin 2006-08-14 19:57 ` Vivek Goyal 2006-08-14 20:10 ` Eric W. Biederman 2006-08-14 20:59 ` Vivek Goyal 2006-08-14 21:15 ` Eric W. Biederman 2006-08-14 20:00 ` Eric W. Biederman 2006-08-08 23:36 ` Andi Kleen 2006-08-25 20:16 ` Vivek Goyal 2006-08-25 20:16 ` Vivek Goyal [not found] <6EIOG-2xY-31@gated-at.bofh.it> [not found] ` <6EIOG-2xY-33@gated-at.bofh.it> [not found] ` <6EIOG-2xY-35@gated-at.bofh.it> [not found] ` <6EIOG-2xY-37@gated-at.bofh.it> [not found] ` <6EIOG-2xY-39@gated-at.bofh.it> [not found] ` <6EIOG-2xY-19@gated-at.bofh.it> [not found] ` <6Gf5M-2zt-23@gated-at.bofh.it> [not found] ` <6Gfpt-30C-49@gated-at.bofh.it> [not found] ` <6GhAA-6bP-19@gated-at.bofh.it> [not found] ` <6Gx2C-436-5@gated-at.bofh.it> [not found] ` <6HhoT-5E7-33@gated-at.bofh.it> [not found] ` <6HhRQ-6uk-3@gated-at.bofh.it> 2006-08-09 12:40 ` [Fastboot] " Bodo Eggert
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.