On Wed, Jan 18, 2012 at 5:12 AM, Indan Zupancic wrote: > > So there is this gap and there is no good way to handle it at all for > user space? And even if it's fixed in the kernel, that won't help with > older kernels, so it will stay a problem for a while. Correct. > Can this int 0x80 trick be blocked for ptraced task (preferably always), > pretty please? Nope. Not that I can tell. The "unable to read $pc-2" is a hardware feature, and we cannot stop users from running the "int 0x80" code. The only way to block it is to simply not enable the 32-bit compatibility mode at all, at which point the "int 0x80" interface simply doesn't exist. And sure, we could do something in the kernel (like saying that you cannot do "int 0x80" from 64-bit code by explicitly testing in the ia32_syscall function), but that has the same "even if it's fixed in the kernel" issue. You can test this feature out with a test-program something like this: #include #include #include #define _GNU_SOURCE #include #include void handler(int sig) { printf("SIGWINCH\n"); } int main(unsigned int argc, char **argv) { signal(SIGWINCH, handler); asm("int $0x80": :"a" (29)); /* sys_pause - 32-bit */ syscall(34); /* sys_pause - 64-bit */ } which does two "pause()" system calls from 64-bit mode, the first one using the legacy system call interface. At least "strace" gets really confused, and will show the first one as shmget(0x1c, 140734112566944, 0) = ? ERESTARTNOHAND (To be restarted) because it assumes that in 64-bit mode, system call number 29 means "shmget". It doesn't even look at $pc-2, which (since this code doesn't try to obfuscate it) would have worked in this case. I actually checked the strace source code. It has # if 0 /* This version analyzes the opcode of a syscall instruction. * (int 0x80 on i386 vs. syscall on x86-64) * It works, but is too complicated. */ unsigned long val, rip, i; if (upeek(tcp, 8*RIP, &rip) < 0) perror("upeek(RIP)"); /* sizeof(syscall) == sizeof(int 0x80) == 2 */ rip -= 2; errno = 0; ... so there is code there that could make it work, but it's #ifdef'ed out. The actually used code just does /* Check CS register value. On x86-64 linux it is: * 0x33 for long mode (64 bit) * 0x23 for compatibility mode (32 bit) * It takes only one ptrace and thus doesn't need * to be cached. */ if (upeek(tcp, 8*CS, &val) < 0) return -1; switch (val) { case 0x23: currpers = 1; break; case 0x33: currpers = 0; break; which is the reasonable and obvious approach. I'm looking at "struct user_regs_struct" and there really isn't any non-architected state there outside of "high bits". There are high bits that we can hide things in outside of orig_ax - we do have 64 bits for "cs" for example - but it all boils down to the same issue: we *will* break something that thinks it knows the details of this. The advantage of "orig_eax" would be that at least it makes conceptual sense there. Using the high bits of 'eflags' might work. Hopefully nobody tests that. IOW, something like the attached might work. It just sets bit#32 in eflags if the system call is a compat call. With that, ptrace would at least be able to tell (assuming a new kernel, of course - it would still need to have the "look at cs" as a fallback) if it's a compat call or not, but it could do something like mode = (eflags >> 32) & 3; switch (mode) { case 0: .. guess it from CS .. case 1: 64-bit case 2: 32-bit default: Oddity. } or something like that. The idea being that you can also see from eflags whether the new feature is supported or not. THIS IS TOTALLY UNTESTED! Linus