2015-02-13 0:54 GMT+03:00 Denys Vlasenko : > 64-bit code was using six stack slots less by not saving/restoring > registers which are callee-preserved according to C ABI, > and not allocating space for them. > Only when syscall needed a complete "struct pt_regs", > the complete area was allocated and filled in. > As an additional twist, on interrupt entry a "slightly less truncated pt_regs" > trick is used, to make nested interrupt stacks easier to unwind. > > This proved to be a source of significant obfuscation and subtle bugs. > For example, stub_fork had to pop the return address, > extend the struct, save registers, and push return address back. Ugly. > ia32_ptregs_common pops return address and "returns" via jmp insn, > throwing a wrench into CPU return stack cache. > > This patch changes code to always allocate a complete "struct pt_regs". > The saving of registers is still done lazily. > > "Partial pt_regs" trick on interrupt stack is retained. > > Macros which manipulate "struct pt_regs" on stack are reworked: > ALLOC_PT_GPREGS_ON_STACK allocates the structure. > SAVE_C_REGS saves to it those registers which are clobbered by C code. > SAVE_EXTRA_REGS saves to it all other registers. > Corresponding RESTORE_* and REMOVE_PT_GPREGS_FROM_STACK macros reverse it. > > ia32_ptregs_common, stub_fork and friends lost their ugly dance with > return pointer. > > LOAD_ARGS32 in ia32entry.S now uses symbolic stack offsets > instead of magic numbers. > > error_entry and save_paranoid now use SAVE_C_REGS + SAVE_EXTRA_REGS > instead of having it open-coded yet again. > > Patch was run-tested: 64-bit executables, 32-bit executables, > strace works. > Timing tests did not show measurable difference in 32-bit > and 64-bit syscalls. Hello Denys, My test vm doesn't boot with this patch. Could you help to investigate this issue? I have attached a kernel config and console log. [ 2.428124] systemd-journald[284]: Received request to flush runtime journal from PID 1 [ 2.508252] traps: systemd-cgroups[380] general protection ip:7f68ad096028 sp:7fffba298af8 error:0 in ld-2.18.so[7f68ad07e000+20000][ OK [ 2.600179] traps: systemd-cgroups[384] general protection ip:7f11b9a9c028 sp:7fff4420f978 error:0 in ld-2.18.so[7f11b9a84000+20000] [ 2.743790] traps: systemd-cgroups[392] general protection ip:7f7f40a44028 sp:7fffe1c1b8b8 error:0 in ld-2.18.so[7f7f40a2c000+20000] [ 2.754576] traps: systemd-cgroups[393] general protection ip:7fd1314bd028 sp:7ffff76ecc88 error:0 in ld-2.18.so[7fd1314a5000+20000] [ 2.765343] traps: systemd-cgroups[396] general protection ip:7ff4537b7028 sp:7fff05902378 error:0 in ld-2.18.so[7ff45379f000+20000] [ 2.798782] traps: systemd-cgroups[399] general protection ip:7f4d5bc9c028 sp:7fff35cb3a48 error:0 in ld-2.18.so[7f4d5bc84000+20000] [ 3.376298] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 3.376298] [ 3.377199] CPU: 2 PID: 1 Comm: systemd Not tainted 3.19.0+ #169 [ 3.377199] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 3.377199] 0000000000000000 00000000302f3f16 ffff88007c88bc48 ffffffff817b2f1a [ 3.377199] 0000000000000000 ffffffff81a348f8 ffff88007c88bcc8 ffffffff817b14d8 [ 3.377199] ffff880000000010 ffff88007c88bcd8 ffff88007c88bc78 00000000302f3f16 [ 3.377199] Call Trace: [ 3.377199] [] dump_stack+0x45/0x57 [ 3.377199] [] panic+0xd5/0x20e [ 3.377199] [] do_exit+0xb15/0xb20 [ 3.377199] [] do_group_exit+0x4e/0xc0 [ 3.377199] [] get_signal+0x271/0x860 [ 3.377199] [] do_signal+0x37/0x760 [ 3.377199] [] ? wake_up_state+0x20/0x20 [ 3.377199] [] ? int_very_careful+0x5/0xd [ 3.377199] [] ? trace_hardirqs_on_caller+0x13d/0x1e0 [ 3.377199] [] do_notify_resume+0x60/0x70 [ 3.377199] [] int_signal+0x12/0x17 [ 3.377199] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) [ 3.377199] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 3.377199] [avagin@localhost linux-2.6]$ git bisect log # bad: [549c45cea13c1b1d4557dec2e5e3f256615682f6] Add linux-next specific files for 20150224 # good: [c517d838eb7d07bbe9507871fab3931deccff539] Linux 4.0-rc1 git bisect start 'next-20150224' 'v4.0-rc1' # good: [bb37267e803b5d88eca99db1f501cb0410de60ff] Merge remote-tracking branch 'integrity/next' git bisect good bb37267e803b5d88eca99db1f501cb0410de60ff # good: [e6dad2c669e2029ee71bd05e858f69e12906dbfc] ia64: use %*pb[l] to print bitmaps including cpumasks and nodemasks git bisect good e6dad2c669e2029ee71bd05e858f69e12906dbfc # good: [e1a0636f6f12948e8e64afe107bda7cb189ef938] Merge remote-tracking branch 'kselftest/next' git bisect good e1a0636f6f12948e8e64afe107bda7cb189ef938 # good: [e39b37bdf3aeda8fed17aa7dff42a6fecfc4f262] fs/ufs/super.c: fix potential race condition git bisect good e39b37bdf3aeda8fed17aa7dff42a6fecfc4f262 # good: [97dc3f62c8f37795cdee3001e86c346dd7f7a879] scripts/gdb: add internal helper and convenience function for per-cpu lookup git bisect good 97dc3f62c8f37795cdee3001e86c346dd7f7a879 # bad: [a2dc0f333a3dd8eba791afc848623c0a708ea2e4] Merge remote-tracking branch 'livepatching/for-next' git bisect bad a2dc0f333a3dd8eba791afc848623c0a708ea2e4 # bad: [82bbadb13ef4b3e2217a2fe297be768caf473314] x86, entry: Remove int_check_syscall_exit_work git bisect bad 82bbadb13ef4b3e2217a2fe297be768caf473314 # bad: [0bc5dd63915de8bac63ef63f6e75c3fecd0838d2] x86: entry_64.S: always allocate complete "struct pt_regs" git bisect bad 0bc5dd63915de8bac63ef63f6e75c3fecd0838d2 # good: [2202eb90f175cf45d1b2d1c64dbb5676a8ad07ad] x86: introduce push/pop macros which generate CFI_REL_OFFSET and CFI_RESTORE git bisect good 2202eb90f175cf45d1b2d1c64dbb5676a8ad07ad # good: [f5e1c4084319a42e5f14d41e2d638949ce66bc08] x86: entry_64.S: fix wrong symbolic constant usage: R11->ARGOFFSET git bisect good f5e1c4084319a42e5f14d41e2d638949ce66bc08 # first bad commit: [0bc5dd63915de8bac63ef63f6e75c3fecd0838d2] x86: entry_64.S: always allocate complete "struct pt_regs"