On 02/25/2015 01:37 PM, Andrey Wagin wrote: > 2015-02-13 0:54 GMT+03:00 Denys Vlasenko : >> 64-bit code was using six stack slots less by not saving/restoring >> registers which are callee-preserved according to C ABI, >> and not allocating space for them. >> Only when syscall needed a complete "struct pt_regs", >> the complete area was allocated and filled in. >> As an additional twist, on interrupt entry a "slightly less truncated pt_regs" >> trick is used, to make nested interrupt stacks easier to unwind. >> >> This proved to be a source of significant obfuscation and subtle bugs. >> For example, stub_fork had to pop the return address, >> extend the struct, save registers, and push return address back. Ugly. >> ia32_ptregs_common pops return address and "returns" via jmp insn, >> throwing a wrench into CPU return stack cache. >> >> This patch changes code to always allocate a complete "struct pt_regs". >> The saving of registers is still done lazily. >> >> "Partial pt_regs" trick on interrupt stack is retained. >> >> Macros which manipulate "struct pt_regs" on stack are reworked: >> ALLOC_PT_GPREGS_ON_STACK allocates the structure. >> SAVE_C_REGS saves to it those registers which are clobbered by C code. >> SAVE_EXTRA_REGS saves to it all other registers. >> Corresponding RESTORE_* and REMOVE_PT_GPREGS_FROM_STACK macros reverse it. >> >> ia32_ptregs_common, stub_fork and friends lost their ugly dance with >> return pointer. >> >> LOAD_ARGS32 in ia32entry.S now uses symbolic stack offsets >> instead of magic numbers. >> >> error_entry and save_paranoid now use SAVE_C_REGS + SAVE_EXTRA_REGS >> instead of having it open-coded yet again. >> >> Patch was run-tested: 64-bit executables, 32-bit executables, >> strace works. >> Timing tests did not show measurable difference in 32-bit >> and 64-bit syscalls. > > Hello Denys, > > My test vm doesn't boot with this patch. Could you help to investigate > this issue? I think I found it. This part of my patch is possibly wrong: @@ -171,9 +171,9 @@ static inline int arch_irqs_disabled(void) #define ARCH_LOCKDEP_SYS_EXIT_IRQ \ TRACE_IRQS_ON; \ sti; \ - SAVE_REST; \ + SAVE_EXTRA_REGS; \ LOCKDEP_SYS_EXIT; \ - RESTORE_REST; \ + RESTORE_EXTRA_REGS; \ cli; \ TRACE_IRQS_OFF; The "SAVE_REST" here is intended to really *push* extra regs on stack, but the patch changed it so that they are written to existing stack slots above. >From code inspection it should work in almost all cases, but some locations where it is used are really obscure. If there are places where *pushing* regs is really necessary, this can corrupt rbp,rbx,r12-15 registers. Your config has CONFIG_LOCKDEP=y, I think it's worth trying whether the bug was here. Please find updated patch attached. Can you try it? -- vda