This is an extension to LKCD to make use of Eric Biederman's kexec implementation to delay the actual writeout of a crashdump to disk to happen after a memory preserving reboot of a new kernel. The real thanks for this goes to Dave Winchell and the rest of the Mission Critical Linux folks for first implementing such an approach in mcore using Werner Alamesberger's bootimg, and letting us learn and borrow ideas from it. There is a subtle but crucial difference in the design of the scheme we use to get spare pages to save the dump which potentially enables us to save a complete memory snapshot (not just kernel pages) if we can get a good compression efficiency (i.e. theoretically limited only by the degree of compressability of the memory state and working memory space that must be left for the dump and kernel bootup code). This code is still somewhat raw and there's a list of todo's and improvements in my mind, and loopholes to fix, but I decided it was high time to put this out for a start, so anyone who is interested could start taking a look and playing with it, and maybe help out if they like. I plan to fold it into lkcd cvs tomorrow if possible unless anyone notices a major regression of existing lkcd functionality (i.e. without CONFIG_CRASHDUMP_MEMDEV and CRASH_DUMP_SOFT_BOOT). I have tried out Alt+Sysrq+d and a simple panic from a module as a sanity check. (I haven't tried it out for a true panic yet - going there bit by bit :)) In any case, I'll tag the cvs tree before checking in. Merging and testing has been rather time consuming, so would appreciate if anyone planning to check in any changes before I do would let me know ahead of time. I'm considering also checkin in a TODO file at the top of the 2.5 directory in CVS to keep track of what needs to be done. Would that be a good idea ? I'll probably also post the TODOs on the mailing list. OK, going ahead: Steps to use: -------------- A. Patching the kernel: 1) Patch vanilla 2.5.59 kernel with the kexec patches for 2.5.59. I picked the ones from the OSDL site which Andy Pfiffer had mentioned in an earlier post kexec for 2.5.59 (based upon the version for 2.5.54) http://www.osdl.org/cgi-bin/plm?module=patch_info&patch_id=1442 hwfixes that makes it work for me (same as for 2.5.58): http://www.osdl.org/cgi-bin/plm?module=patch_info&patch_id=1444 2) Apply the latest dump patches from lkcd cvs i.e. apply the kernel patches under 2.5/patches (expect to see one reject in the 2nd hunk for reboot.c when applying notify_die.patch - you could ignore it for now) and copy the dump driver files at the appropriate places 3) Apply the attached patch (kexecdump.patch) B. Kernel Build Configuration settings You'll need CRASH_DUMP to be built into the kernel (not as a module) to be able to dump across a kexec boot CRASH_DUMP_BLOCKDEV, CRASH_DUMP_COMPRESS_GZIP are needed as we use them today New options you'll need CRASH_DUMP_MEMDEV (memory dump driver) and CRASH_DUMP_SOFTBOOT (kexec based dumping) C. Run-time setup A new dump flag for memory-save-and-dump-after-boot DUMP_FLAGS_SOFTBOOT has been introduced (0x2), which would need to be turned on in the dump flags. After running lkcd config as usual, there is one extra step needed to load the kernel to be kexec'ed This involves executing "kexec -l" with the regular command line options (derived from you /proc/cmdline) and one extra boot parameter, obtained as follows: crashdump=`cat /proc/sys/kernel/dump/addr` (This tells the new kernel where to find a saved in-memory crash dump from previous boot) e.g. kexec -l --command-line="root=806 console=tty0 console= ttyS0,38400 crashdump=`cat /proc/sys/kernel/dump/addr`" D. On panic, the dump is saved in memory and then kexec is used to boot up a new kernel (instead of a regular reboot) If Alt+Sysrq+d is pressed then the dump is just saved in memory without rebooting [Note: The first few times you try it, it might be a good idea to drop into "init 1" and unmount most filesystems or remount them as read-only , before you force the panic - thanks to Andy Pfiffer for the tip ] E. After running "lkcd config" triggers a writeout to the dump disk of the previously saved dump in memory. F. From here on, one can run "lkcd save" as usual to generate the /var/log/dump/* files for analysis. Regards Suparna -- Suparna Bhattacharya (suparna@in.ibm.com) Linux Technology Center IBM Software Labs, India