* JIT emulator needs @ 2007-06-08 6:35 Albert Cahalan 2007-06-08 7:09 ` Eric Dumazet ` (4 more replies) 0 siblings, 5 replies; 28+ messages in thread From: Albert Cahalan @ 2007-06-08 6:35 UTC (permalink / raw) To: linux-kernel Right now, Linux isn't all that friendly to JIT emulators. Here are the problems and suggestions to improve the situation. There is an SE Linux execmem restriction that enforces W^X. Assuming you don't wish to just disable SE Linux, there are two ugly ways around the problem. You can mmap a file twice, or you can abuse SysV shared memory. The mmap method requires that you know of a filesystem mounted rw,exec where you can write a very large temporary file. This arbitrary filesystem, rather than swap space, will be the backing store. The SysV shared memory method requires an undocumented flag and is subject to some annoying size limits. Both methods create objects that will fail to be deleted if the program dies before marking the objects for deletion. Processors often have annoying limits on the immediate values in instructions. An x86 or x86_64 JIT can go a bit faster if all allocations are kept to the low 2 GB of address space. There are also reasons for a 32bit-to-x86_64 JIT to chose a nearly arbitrary 2 GB region that lies above 4 GB. Other archs have other limits, such as 32 MB or 256 MB. Sometimes it is very helpful to have the read/write mapping be a fixed offset from the read/exec mapping. A power of 2 can be especially desirable. Emulators often need a cheap way to change page permissions. One VMA per page is no good. Besides taking up space and making many things generally slower, having one VMA per page causes a huge performance loss for snapshot roll-back operations. Just tearing down all those VMAs takes a good while. Additions to better support JIT emulators: a. sysctl to set IPC_RMID by default b. shmget() flag to set IPC_RMID by default c. open() flag to unlink a file before returning the fd d. mremap() flag to always keep the old mapping e. mremap() flag to get a read/write mapping of a read/exec one f. mremap() flag to get a read/exec mapping of a read/write one g. mremap() flag to make the 5th arg (new addr) be the upper limit h. 6-bit wide mremap() "flag" to set the upper limit above given base i. support the prot argument to remap_file_pages j. a documented way (madvise?) to punch same-VMA zero-page holes ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-08 6:35 JIT emulator needs Albert Cahalan @ 2007-06-08 7:09 ` Eric Dumazet 2007-06-09 4:12 ` Albert Cahalan 2007-06-08 11:10 ` Alan Cox ` (3 subsequent siblings) 4 siblings, 1 reply; 28+ messages in thread From: Eric Dumazet @ 2007-06-08 7:09 UTC (permalink / raw) To: Albert Cahalan; +Cc: linux-kernel, Davide Libenzi Albert Cahalan a écrit : > Right now, Linux isn't all that friendly to JIT emulators. > Here are the problems and suggestions to improve the situation. > > There is an SE Linux execmem restriction that enforces W^X. > Assuming you don't wish to just disable SE Linux, there are > two ugly ways around the problem. You can mmap a file twice, > or you can abuse SysV shared memory. The mmap method requires > that you know of a filesystem mounted rw,exec where you can > write a very large temporary file. This arbitrary filesystem, > rather than swap space, will be the backing store. The SysV > shared memory method requires an undocumented flag and is > subject to some annoying size limits. Both methods create > objects that will fail to be deleted if the program dies > before marking the objects for deletion. > > Processors often have annoying limits on the immediate values > in instructions. An x86 or x86_64 JIT can go a bit faster if > all allocations are kept to the low 2 GB of address space. > There are also reasons for a 32bit-to-x86_64 JIT to chose > a nearly arbitrary 2 GB region that lies above 4 GB. > Other archs have other limits, such as 32 MB or 256 MB. > > Sometimes it is very helpful to have the read/write mapping > be a fixed offset from the read/exec mapping. A power of 2 > can be especially desirable. > > Emulators often need a cheap way to change page permissions. > One VMA per page is no good. Besides taking up space and making > many things generally slower, having one VMA per page causes > a huge performance loss for snapshot roll-back operations. > Just tearing down all those VMAs takes a good while. > > Additions to better support JIT emulators: > > a. sysctl to set IPC_RMID by default Not very good, this will break some apps. > b. shmget() flag to set IPC_RMID by default This is better :) > c. open() flag to unlink a file before returning the fd Well, I assume you would like fd = open("/path/somefile", O_RDWR | O_CREAT | O_UNLINK, 0644) (ie allocate a file handle but no name ?) Quite difficult to implement this atomically with current vfs, maybe a new syscall would be better. (Linus will kill me for that :) ) (We dont need to insert "somefile" in one directory, then unlink it, we only need to allocate an unnamed inode to get some backing store) This is a generalization of anonymous inodes ( fs/anon_inodes.c ) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-08 7:09 ` Eric Dumazet @ 2007-06-09 4:12 ` Albert Cahalan 0 siblings, 0 replies; 28+ messages in thread From: Albert Cahalan @ 2007-06-09 4:12 UTC (permalink / raw) To: Eric Dumazet; +Cc: linux-kernel, Davide Libenzi On 6/8/07, Eric Dumazet <dada1@cosmosbay.com> wrote: > Albert Cahalan a écrit : > > Additions to better support JIT emulators: > > > > a. sysctl to set IPC_RMID by default > > Not very good, this will break some apps. As a sysctl, the admin gets to choose between compatibility and sanity. I can see such a sysctl also being really helpful for a shared computer used for an Operating Systems or System Programming course. > > b. shmget() flag to set IPC_RMID by default > > This is better :) Both are good. This one requires that all apps using SysV shared memory be modified to use the flag. The other requires that a very few apps be modified to tolerate a behavior change. > > c. open() flag to unlink a file before returning the fd > > > Well, I assume you would like fd = open("/path/somefile", O_RDWR | O_CREAT | > O_UNLINK, 0644) > > (ie allocate a file handle but no name ?) Yes. > Quite difficult to implement this atomically with current vfs, maybe a new > syscall would be better. (Linus will kill me for that :) ) > > (We dont need to insert "somefile" in one directory, then unlink it, we only > need to allocate an unnamed inode to get some backing store) I suspect that SMB/CIFS has a native call for this. There is some sort of tmpfile flag defined over in that world. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-08 6:35 JIT emulator needs Albert Cahalan 2007-06-08 7:09 ` Eric Dumazet @ 2007-06-08 11:10 ` Alan Cox 2007-06-08 16:35 ` Nicholas Miell 2007-06-09 5:17 ` Albert Cahalan 2007-06-09 20:00 ` H. Peter Anvin ` (2 subsequent siblings) 4 siblings, 2 replies; 28+ messages in thread From: Alan Cox @ 2007-06-08 11:10 UTC (permalink / raw) To: Albert Cahalan; +Cc: linux-kernel > There is an SE Linux execmem restriction that enforces W^X. This depends on whatever SELinux rulesets you are running. Its just a good rule to have present that most programs shouldn't be self patching, and then label those that do differently. > Sometimes it is very helpful to have the read/write mapping > be a fixed offset from the read/exec mapping. A power of 2 > can be especially desirable. mmap MAP_FIXED can do this but you need to know a lot about the memory layout of the system so it gets a bit platform specific. > Emulators often need a cheap way to change page permissions. mprotect(, range) rather than a page at a time. The kernel will do merging. > a. sysctl to set IPC_RMID by default > b. shmget() flag to set IPC_RMID by default Use POSIX shared memory > c. open() flag to unlink a file before returning the fd Is it really that costly to create a blank file, why do you need to do it a lot in a JIT ? > e. mremap() flag to get a read/write mapping of a read/exec one > f. mremap() flag to get a read/exec mapping of a read/write one > g. mremap() flag to make the 5th arg (new addr) be the upper limit This is all mprotect and munmap. > h. 6-bit wide mremap() "flag" to set the upper limit above given base > i. support the prot argument to remap_file_pages > j. a documented way (madvise?) to punch same-VMA zero-page holes mmap (although you get more VMAs from that) so memset() is probably genuinely cheaper if the permissions are not changing. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-08 11:10 ` Alan Cox @ 2007-06-08 16:35 ` Nicholas Miell 2007-06-09 5:17 ` Albert Cahalan 1 sibling, 0 replies; 28+ messages in thread From: Nicholas Miell @ 2007-06-08 16:35 UTC (permalink / raw) To: Alan Cox; +Cc: Albert Cahalan, linux-kernel On Fri, 2007-06-08 at 12:10 +0100, Alan Cox wrote: > > e. mremap() flag to get a read/write mapping of a read/exec one > > f. mremap() flag to get a read/exec mapping of a read/write one > > g. mremap() flag to make the 5th arg (new addr) be the upper limit > > This is all mprotect and munmap. I think he's asking for a way to copy an existing mapping, which does sound genuinely useful. (i.e. mremap(ptr, size, size, MREMAP_COPY), with no need to mess with files to get multiple mappings of the same region) -- Nicholas Miell <nmiell@comcast.net> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-08 11:10 ` Alan Cox 2007-06-08 16:35 ` Nicholas Miell @ 2007-06-09 5:17 ` Albert Cahalan 1 sibling, 0 replies; 28+ messages in thread From: Albert Cahalan @ 2007-06-09 5:17 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel On 6/8/07, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > > There is an SE Linux execmem restriction that enforces W^X. > > This depends on whatever SELinux rulesets you are running. Its just a > good rule to have present that most programs shouldn't be self patching, > and then label those that do differently. A marking in the executable would have made more sense. It is really broken having an unprivileged user being able to create whole new executables but unable to lift this restriction on those executables. In any case, the restriction is common and troublesome. > > Sometimes it is very helpful to have the read/write mapping > > be a fixed offset from the read/exec mapping. A power of 2 > > can be especially desirable. > > mmap MAP_FIXED can do this but you need to know a lot about the memory > layout of the system so it gets a bit platform specific. Yes. There are unportable programs, and UNPORTABLE ones. Memory layout can vary between vendor kernels, between normal and 32-on-64 situations, between two different C libraries... > > Emulators often need a cheap way to change page permissions. > > mprotect(, range) rather than a page at a time. The kernel will do > merging. Nope. This can happen rapidly and repeatedly to pages that are essentially random. The median length of a range will be a page or two. Merging won't do very much at all. > > a. sysctl to set IPC_RMID by default > > b. shmget() flag to set IPC_RMID by default > > Use POSIX shared memory That appears to have the exact same problem. > > c. open() flag to unlink a file before returning the fd > > Is it really that costly to create a blank file, why do you need to do it > a lot in a JIT ? This part isn't about cost. It's about not leaving around debris when the JIT crashes. > > e. mremap() flag to get a read/write mapping of a read/exec one > > f. mremap() flag to get a read/exec mapping of a read/write one > > g. mremap() flag to make the 5th arg (new addr) be the upper limit > > This is all mprotect and munmap. That won't get me a second mapping. Supposing that I had a second mapping, SE Linux would deny the mprotect. I'm looking for a mapping that is born executable or a mapping that is born writable, as needed, so that no transition is needed. > > h. 6-bit wide mremap() "flag" to set the upper limit above given base > > i. support the prot argument to remap_file_pages > > j. a documented way (madvise?) to punch same-VMA zero-page holes > > mmap (although you get more VMAs from that) so memset() is probably > genuinely cheaper if the permissions are not changing. Well cost is the problem here. I sure can find some way to get the operation done, but it isn't cheap. For some usages, the current setup is costly enough that one must consider abandoning the hardware MMU in favor of a software one emitted as part of the JIT. :-( ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-08 6:35 JIT emulator needs Albert Cahalan 2007-06-08 7:09 ` Eric Dumazet 2007-06-08 11:10 ` Alan Cox @ 2007-06-09 20:00 ` H. Peter Anvin 2007-06-19 15:08 ` William Lee Irwin III 2007-06-21 17:44 ` Arjan van de Ven 4 siblings, 0 replies; 28+ messages in thread From: H. Peter Anvin @ 2007-06-09 20:00 UTC (permalink / raw) To: Albert Cahalan; +Cc: linux-kernel Albert Cahalan wrote: > There is an SE Linux execmem restriction that enforces W^X. > Assuming you don't wish to just disable SE Linux, there are > two ugly ways around the problem. This should be fixed in SELinux, or more accurately the SELinux profile. There is absolutely no other sane option. Of course, you generally don't need a page to be writable and executable at the same time, but the overhead of switching can be enormous. -hpa ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-08 6:35 JIT emulator needs Albert Cahalan ` (2 preceding siblings ...) 2007-06-09 20:00 ` H. Peter Anvin @ 2007-06-19 15:08 ` William Lee Irwin III 2007-06-20 3:16 ` Albert Cahalan 2007-06-23 3:52 ` Kyle Moffett 2007-06-21 17:44 ` Arjan van de Ven 4 siblings, 2 replies; 28+ messages in thread From: William Lee Irwin III @ 2007-06-19 15:08 UTC (permalink / raw) To: Albert Cahalan; +Cc: linux-kernel On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: > Right now, Linux isn't all that friendly to JIT emulators. > Here are the problems and suggestions to improve the situation. > There is an SE Linux execmem restriction that enforces W^X. > Assuming you don't wish to just disable SE Linux, there are > two ugly ways around the problem. You can mmap a file twice, > or you can abuse SysV shared memory. The mmap method requires > that you know of a filesystem mounted rw,exec where you can > write a very large temporary file. This arbitrary filesystem, > rather than swap space, will be the backing store. The SysV > shared memory method requires an undocumented flag and is > subject to some annoying size limits. Both methods create > objects that will fail to be deleted if the program dies > before marking the objects for deletion. If the policy forbidding self-modifying code lacks a method of exempting programs such as JIT interpreters (which I doubt) then it's a problem. I'm with Alan on this one. On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: > Processors often have annoying limits on the immediate values > in instructions. An x86 or x86_64 JIT can go a bit faster if > all allocations are kept to the low 2 GB of address space. > There are also reasons for a 32bit-to-x86_64 JIT to chose > a nearly arbitrary 2 GB region that lies above 4 GB. > Other archs have other limits, such as 32 MB or 256 MB. This sort of logic might be appropriate for a sort of parametrized and specialized vma allocator setting the policy in /proc/ along with various sorts of limits. There are limits to such and at some point things will have to manually manage their own process address spaces in a platform-specific fashion. If kernel assistance here is rejected they may have to do so in all cases. On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: > Sometimes it is very helpful to have the read/write mapping > be a fixed offset from the read/exec mapping. A power of 2 > can be especially desirable. As far as the kernel is concerned they're unrelated, so this will likely need MAP_FIXED barring a staggering array of fresh system calls to act on tuples of memory ranges in lockstep. On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: > Emulators often need a cheap way to change page permissions. > One VMA per page is no good. Besides taking up space and making > many things generally slower, having one VMA per page causes > a huge performance loss for snapshot roll-back operations. > Just tearing down all those VMAs takes a good while. remap_file_pages_prot() is reputedly waiting in the wings somewhere for this. On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: > Additions to better support JIT emulators: > a. sysctl to set IPC_RMID by default This is a bad idea. The standard semantics are needed for programs relying upon them. On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: > b. shmget() flag to set IPC_RMID by default This is relatively innocuous. On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: > c. open() flag to unlink a file before returning the fd You probably want a tmpfile(3) -like affair which never has a pathname to begin with. It could be useful for security purposes more generally. On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: > d. mremap() flag to always keep the old mapping This sounds vaguely like another syscall, like mdup(). This is particularly meaningful in the context of anonymous memory, for which there is no method of replicating mappings within a single process address space. On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: > e. mremap() flag to get a read/write mapping of a read/exec one > f. mremap() flag to get a read/exec mapping of a read/write one Presumably to be used in conjunction with keeping the old mapping. A composite mdup()/mremap() and mprotect(), presumably saving a TLB flush or other sorts of overhead, may make some sort of sense here. Odds are it'll get rejected as the sequence of syscalls is a rather precise equivalent, though it would optimize things (as would other composite syscalls, e.g. ones combining fork() and execve() etc.). On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: > g. mremap() flag to make the 5th arg (new addr) be the upper limit > h. 6-bit wide mremap() "flag" to set the upper limit above given base Essentially more placement support for mremap()/mdup(). It's not clear to me those particular semantics are the ideal ones. A target range for placement should do, if not manual address space management. On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: > i. support the prot argument to remap_file_pages This is probably going to happen anyway. On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: > j. a documented way (madvise?) to punch same-VMA zero-page holes This is MADV_REMOVE, though most filesystems don't support it. Do you need it for more than tmpfs? -- wli ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-19 15:08 ` William Lee Irwin III @ 2007-06-20 3:16 ` Albert Cahalan 2007-06-20 16:01 ` William Lee Irwin III 2007-06-23 3:52 ` Kyle Moffett 1 sibling, 1 reply; 28+ messages in thread From: Albert Cahalan @ 2007-06-20 3:16 UTC (permalink / raw) To: William Lee Irwin III; +Cc: linux-kernel On 6/19/07, William Lee Irwin III <wli@holomorphy.com> wrote: > On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: >> Right now, Linux isn't all that friendly to JIT emulators. >> Here are the problems and suggestions to improve the situation. >> There is an SE Linux execmem restriction that enforces W^X. >> Assuming you don't wish to just disable SE Linux, there are >> two ugly ways around the problem. You can mmap a file twice, >> or you can abuse SysV shared memory. The mmap method requires >> that you know of a filesystem mounted rw,exec where you can >> write a very large temporary file. This arbitrary filesystem, >> rather than swap space, will be the backing store. The SysV >> shared memory method requires an undocumented flag and is >> subject to some annoying size limits. Both methods create >> objects that will fail to be deleted if the program dies >> before marking the objects for deletion. > > If the policy forbidding self-modifying code lacks a method of > exempting programs such as JIT interpreters (which I doubt) then > it's a problem. I'm with Alan on this one. It does and it doesn't. There is not a reasonable way for a user to mark an app as needing full self-modifying ability. It's not like the executable stack, which can be set via the ELF note markings on the executable. (ELF note markings are ideal because they can not be used via a ret-to-libc attack) With admin privs, one can change SE Linux settings. Mark the executable, disable the protection system-wide, generate a completely new SE Linux policy, or just turn SE Linux off. Normally we don't expect/require admin privs to install an executable in one's own ~/bin directory. This is broken. It ought to be easier to get a JIT working well without enabling arbitrary mprotect. This would allow a JIT to partially benefit from the recent security enhancements. (think of all the buggy browser-based JIT things!) > On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: >> Processors often have annoying limits on the immediate values >> in instructions. An x86 or x86_64 JIT can go a bit faster if >> all allocations are kept to the low 2 GB of address space. >> There are also reasons for a 32bit-to-x86_64 JIT to chose >> a nearly arbitrary 2 GB region that lies above 4 GB. >> Other archs have other limits, such as 32 MB or 256 MB. > > This sort of logic might be appropriate for a sort of parametrized > and specialized vma allocator setting the policy in /proc/ along > with various sorts of limits. There are limits to such and at some > point things will have to manually manage their own process address > spaces in a platform-specific fashion. If kernel assistance here is > rejected they may have to do so in all cases. I prefer ELF notes (for start-up allocations) and prctl, plus a mmap flag for per-allocation behavior. > On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: >> Additions to better support JIT emulators: >> a. sysctl to set IPC_RMID by default > > This is a bad idea. The standard semantics are needed for programs > relying upon them. I didn't mean that the default default :-) setting would change. I meant that people could change the behavior from a boot script. Things that break are really foul and nasty anyway, probably with serious problems that ought to get fixed. > On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: >> c. open() flag to unlink a file before returning the fd > > You probably want a tmpfile(3) -like affair which never has a pathname > to begin with. It could be useful for security purposes more generally. Yes, exactly. I think there are some possible optimizations available too, particularly with the cifs filesystem. > On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: >> d. mremap() flag to always keep the old mapping > > This sounds vaguely like another syscall, like mdup(). This is > particularly meaningful in the context of anonymous memory, for > which there is no method of replicating mappings within a single > process address space. Yes, mdup() and probably mdup2(). It could be mremap flags or not. JIT emulators generally need a second mapping so that they can have both read/write and execute for the same physical memory. It is somewhat tolerable to have SE Linux enforce that the second mapping be randomized. (it helps security greatly, but slows the emulator by a tiny bit) > On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: >> e. mremap() flag to get a read/write mapping of a read/exec one >> f. mremap() flag to get a read/exec mapping of a read/write one > > Presumably to be used in conjunction with keeping the old mapping. > A composite mdup()/mremap() and mprotect(), presumably saving a TLB > flush or other sorts of overhead, may make some sort of sense here. > Odds are it'll get rejected as the sequence of syscalls is a rather > precise equivalent, though it would optimize things (as would other > composite syscalls, e.g. ones combining fork() and execve() etc.). A few mremap flags ought to do the job I think. > On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: >> g. mremap() flag to make the 5th arg (new addr) be the upper limit >> h. 6-bit wide mremap() "flag" to set the upper limit above given base > > Essentially more placement support for mremap()/mdup(). It's not clear > to me those particular semantics are the ideal ones. A target range > for placement should do, if not manual address space management. Yes. I'm looking for the change that will help JIT emulators the most while hurting security the least. > On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: >> i. support the prot argument to remap_file_pages > > This is probably going to happen anyway. Great. > On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: >> j. a documented way (madvise?) to punch same-VMA zero-page holes > > This is MADV_REMOVE, though most filesystems don't support it. Do you > need it for more than tmpfs? Yes and no. It's painful to be restricted to one backing store. Covering MAP_ANONYMOUS and SysV shared mem is most critical. I suppose that other filesystems may require multiple flags to deal with the desire to (not) punch a hole on disk and what to do if that isn't possible. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-20 3:16 ` Albert Cahalan @ 2007-06-20 16:01 ` William Lee Irwin III 2007-06-20 16:37 ` H. Peter Anvin 2007-06-20 18:43 ` Albert Cahalan 0 siblings, 2 replies; 28+ messages in thread From: William Lee Irwin III @ 2007-06-20 16:01 UTC (permalink / raw) To: Albert Cahalan; +Cc: linux-kernel On 6/19/07, William Lee Irwin III <wli@holomorphy.com> wrote: >> If the policy forbidding self-modifying code lacks a method of >> exempting programs such as JIT interpreters (which I doubt) then >> it's a problem. I'm with Alan on this one. On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote: > It does and it doesn't. There is not a reasonable way for a > user to mark an app as needing full self-modifying ability. > It's not like the executable stack, which can be set via the > ELF note markings on the executable. (ELF note markings are > ideal because they can not be used via a ret-to-libc attack) > With admin privs, one can change SE Linux settings. Mark the > executable, disable the protection system-wide, generate a > completely new SE Linux policy, or just turn SE Linux off. > Normally we don't expect/require admin privs to install an > executable in one's own ~/bin directory. This is broken. > It ought to be easier to get a JIT working well without > enabling arbitrary mprotect. This would allow a JIT to > partially benefit from the recent security enhancements. > (think of all the buggy browser-based JIT things!) I presumed an ELF note or extended filesystem attributes were already in place for this sort of affair. It may be that the model implemented is so restrictive that users are forbidden to create new executables, in which case using a different model is certainly in order. Otherwise the ELF note or attributes need to be implemented. On 6/19/07, William Lee Irwin III <wli@holomorphy.com> wrote: >> This sort of logic might be appropriate for a sort of parametrized >> and specialized vma allocator setting the policy in /proc/ along >> with various sorts of limits. There are limits to such and at some >> point things will have to manually manage their own process address >> spaces in a platform-specific fashion. If kernel assistance here is >> rejected they may have to do so in all cases. On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote: > I prefer ELF notes (for start-up allocations) and prctl, > plus a mmap flag for per-allocation behavior. Beware that the kernel (upstream of me) will likely refuse to support to exotic mmap() placement policies. At that point userspace will have to implement them itself with a front-end to mmap(). Userspace can actually live without kernel placement support for everything but the executable itself, which is already implemented via ELF loading standards. This is not to downplay the tremendous amounts of pain involved for moving the stack, getting ld.so to land in the right place, and so on. Actually I'm less sure about .interp placement. In any event, exotic virtualspace allocation policies are largely yet another "simple matter of programming" implementable entirely in userspace. On 6/19/07, William Lee Irwin III <wli@holomorphy.com> wrote: >> This is a bad idea. The standard semantics are needed for programs >> relying upon them. On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote: > I didn't mean that the default default :-) setting would change. > I meant that people could change the behavior from a boot script. > Things that break are really foul and nasty anyway, probably with > serious problems that ought to get fixed. It's actually not a good idea to make it the default even via sysctl. People won't realize something will break until it does, and what will break is likely to be a database responsible for data integrity. The IPC_RMID creation flag should suffice. On 6/19/07, William Lee Irwin III <wli@holomorphy.com> wrote: >> You probably want a tmpfile(3) -like affair which never has a pathname >> to begin with. It could be useful for security purposes more generally. On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote: > Yes, exactly. I think there are some possible optimizations > available too, particularly with the cifs filesystem. I doubt this will be controversial, but it's not clear to me that there is any convenient way to obtain an anonymous inode on anything but tmpfs, in which case it's not really anonymous, but not visible to userspace on account of the default kern_mount(). Essentially it's possible to hoist the tmpfile name generation in-kernel to where it's in a disconnected namespace not visible to any userspace whatsoever, and kernel threads can cooperatively ensure safety via access discipline. Alternatively, one could kern_mount() a fresh tmpfs filesystem for some concurrency domain, e.g. per-uid, per-process, or per-thread. On 6/19/07, William Lee Irwin III <wli@holomorphy.com> wrote: >> This sounds vaguely like another syscall, like mdup(). This is >> particularly meaningful in the context of anonymous memory, for >> which there is no method of replicating mappings within a single >> process address space. On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote: > Yes, mdup() and probably mdup2(). It could be mremap flags or not. > JIT emulators generally need a second mapping so that they can > have both read/write and execute for the same physical memory. > It is somewhat tolerable to have SE Linux enforce that the second > mapping be randomized. (it helps security greatly, but slows the > emulator by a tiny bit) I think this may be doable via an mremap() flag barring needing to break it up into multiple syscalls so it's implementable on all architectures. That itself will be so difficult to get merged the duplication may have to stand on its own as an mremap() flag. On 6/19/07, William Lee Irwin III <wli@holomorphy.com> wrote: >> Presumably to be used in conjunction with keeping the old mapping. >> A composite mdup()/mremap() and mprotect(), presumably saving a TLB >> flush or other sorts of overhead, may make some sort of sense here. >> Odds are it'll get rejected as the sequence of syscalls is a rather >> precise equivalent, though it would optimize things (as would other >> composite syscalls, e.g. ones combining fork() and execve() etc.). On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote: > A few mremap flags ought to do the job I think. mremap() already has so many arguments this is going to be difficult to get merged. Breaking it up into multiple syscalls will not be easy to get past people, and there are architectures that can't implement syscalls with too many arguments. On 6/19/07, William Lee Irwin III <wli@holomorphy.com> wrote: >> This is MADV_REMOVE, though most filesystems don't support it. Do you >> need it for more than tmpfs? On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote: > Yes and no. It's painful to be restricted to one backing store. > Covering MAP_ANONYMOUS and SysV shared mem is most critical. > I suppose that other filesystems may require multiple flags to > deal with the desire to (not) punch a hole on disk and what to > do if that isn't possible. If those two are the bare necessities, they're already in place. -- wli ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-20 16:01 ` William Lee Irwin III @ 2007-06-20 16:37 ` H. Peter Anvin 2007-06-20 17:54 ` William Lee Irwin III 2007-06-20 18:25 ` Albert Cahalan 2007-06-20 18:43 ` Albert Cahalan 1 sibling, 2 replies; 28+ messages in thread From: H. Peter Anvin @ 2007-06-20 16:37 UTC (permalink / raw) To: William Lee Irwin III; +Cc: Albert Cahalan, linux-kernel William Lee Irwin III wrote: > > I presumed an ELF note or extended filesystem attributes were already > in place for this sort of affair. It may be that the model implemented > is so restrictive that users are forbidden to create new executables, > in which case using a different model is certainly in order. Otherwise > the ELF note or attributes need to be implemented. > Another thing to keep in mind, since we're talking about security policies in the first place, is that anything like this *MUST* be "opt-in" on the part of the security policy, because what we're talking about is circumventing an explicit security policy just based on a user-provided binary saying, in effect, "don't worry, I know what I'm doing." Changing the meaning of an established explicit security policy is not acceptable. -hpa ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-20 16:37 ` H. Peter Anvin @ 2007-06-20 17:54 ` William Lee Irwin III 2007-06-20 18:23 ` H. Peter Anvin 2007-06-20 18:25 ` Albert Cahalan 1 sibling, 1 reply; 28+ messages in thread From: William Lee Irwin III @ 2007-06-20 17:54 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Albert Cahalan, linux-kernel William Lee Irwin III wrote: >> I presumed an ELF note or extended filesystem attributes were already >> in place for this sort of affair. It may be that the model implemented >> is so restrictive that users are forbidden to create new executables, >> in which case using a different model is certainly in order. Otherwise >> the ELF note or attributes need to be implemented. On Wed, Jun 20, 2007 at 09:37:31AM -0700, H. Peter Anvin wrote: > Another thing to keep in mind, since we're talking about security > policies in the first place, is that anything like this *MUST* be > "opt-in" on the part of the security policy, because what we're talking > about is circumventing an explicit security policy just based on a > user-provided binary saying, in effect, "don't worry, I know what I'm > doing." > Changing the meaning of an established explicit security policy is not > acceptable. This is what I had in mind with the commentary on the intentions of the policy. Thank you for correcting my hamhanded attempt to describe it. -- wli ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-20 17:54 ` William Lee Irwin III @ 2007-06-20 18:23 ` H. Peter Anvin 0 siblings, 0 replies; 28+ messages in thread From: H. Peter Anvin @ 2007-06-20 18:23 UTC (permalink / raw) To: William Lee Irwin III; +Cc: Albert Cahalan, linux-kernel William Lee Irwin III wrote: > William Lee Irwin III wrote: >>> I presumed an ELF note or extended filesystem attributes were already >>> in place for this sort of affair. It may be that the model implemented >>> is so restrictive that users are forbidden to create new executables, >>> in which case using a different model is certainly in order. Otherwise >>> the ELF note or attributes need to be implemented. > > On Wed, Jun 20, 2007 at 09:37:31AM -0700, H. Peter Anvin wrote: >> Another thing to keep in mind, since we're talking about security >> policies in the first place, is that anything like this *MUST* be >> "opt-in" on the part of the security policy, because what we're talking >> about is circumventing an explicit security policy just based on a >> user-provided binary saying, in effect, "don't worry, I know what I'm >> doing." >> Changing the meaning of an established explicit security policy is not >> acceptable. > > This is what I had in mind with the commentary on the intentions of the > policy. Thank you for correcting my hamhanded attempt to describe it. > Right. It's important to notice that it's actually more of an issue if the user can create executables, but the policy doesn't want to allow them to run bypassing the policy. -hpa ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-20 16:37 ` H. Peter Anvin 2007-06-20 17:54 ` William Lee Irwin III @ 2007-06-20 18:25 ` Albert Cahalan 2007-06-20 18:51 ` H. Peter Anvin 1 sibling, 1 reply; 28+ messages in thread From: Albert Cahalan @ 2007-06-20 18:25 UTC (permalink / raw) To: H. Peter Anvin; +Cc: William Lee Irwin III, linux-kernel On 6/20/07, H. Peter Anvin <hpa@zytor.com> wrote: > William Lee Irwin III wrote: > > I presumed an ELF note or extended filesystem attributes were already > > in place for this sort of affair. It may be that the model implemented > > is so restrictive that users are forbidden to create new executables, > > in which case using a different model is certainly in order. Otherwise > > the ELF note or attributes need to be implemented. > > Another thing to keep in mind, since we're talking about security > policies in the first place, is that anything like this *MUST* be > "opt-in" on the part of the security policy, because what we're talking > about is circumventing an explicit security policy just based on a > user-provided binary saying, in effect, "don't worry, I know what I'm > doing." > > Changing the meaning of an established explicit security policy is not > acceptable. Not in this case. If an attacker can CHANGE THE BINARY then it's already game over. Putting this into the security policy was an error born of lazyness to begin with. Abuse of the security mechanism was easier than hacking the toolchain, ELF loader, etc. Either a binary needs self-modification, or it doesn't. This is determined by the author of the code. If you don't trust an executable that needs this ability, then you simply can not run it in a useful way. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-20 18:25 ` Albert Cahalan @ 2007-06-20 18:51 ` H. Peter Anvin 2007-06-21 3:21 ` Albert Cahalan 0 siblings, 1 reply; 28+ messages in thread From: H. Peter Anvin @ 2007-06-20 18:51 UTC (permalink / raw) To: Albert Cahalan; +Cc: William Lee Irwin III, linux-kernel Albert Cahalan wrote: > Putting this into the security policy was an error born of > lazyness to begin with. Abuse of the security mechanism > was easier than hacking the toolchain, ELF loader, etc. > > Either a binary needs self-modification, or it doesn't. This is > determined by the author of the code. If you don't trust an > executable that needs this ability, then you simply can not > run it in a useful way. That's fine. That's a policy decision. That's what a security policy *is*. The owner of the system has decided, by security policy, that that is not allowed. Bypassing that is not acceptable. -hpa ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-20 18:51 ` H. Peter Anvin @ 2007-06-21 3:21 ` Albert Cahalan 2007-06-21 3:32 ` H. Peter Anvin 0 siblings, 1 reply; 28+ messages in thread From: Albert Cahalan @ 2007-06-21 3:21 UTC (permalink / raw) To: H. Peter Anvin; +Cc: William Lee Irwin III, linux-kernel On 6/20/07, H. Peter Anvin <hpa@zytor.com> wrote: > Albert Cahalan wrote: > > Putting this into the security policy was an error born of > > lazyness to begin with. Abuse of the security mechanism > > was easier than hacking the toolchain, ELF loader, etc. > > > > Either a binary needs self-modification, or it doesn't. This is > > determined by the author of the code. If you don't trust an > > executable that needs this ability, then you simply can not > > run it in a useful way. > > That's fine. That's a policy decision. That's what a security policy > *is*. The owner of the system has decided, by security policy, that > that is not allowed. Bypassing that is not acceptable. Fixing a bug should be acceptable. Look, let's back up a bit here. At a high level, what exactly do you imagine that this behavior was intended for? I suggest you list some examples of the attacks that are blocked. Can you come up with a reasonable argument that the current behavior is the least painful restriction required to block those attacks? Does the current behavior block any attack that the proposed behavior would not? (list the attacks please) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-21 3:21 ` Albert Cahalan @ 2007-06-21 3:32 ` H. Peter Anvin 2007-06-21 7:38 ` Albert Cahalan 0 siblings, 1 reply; 28+ messages in thread From: H. Peter Anvin @ 2007-06-21 3:32 UTC (permalink / raw) To: Albert Cahalan; +Cc: William Lee Irwin III, linux-kernel Albert Cahalan wrote: >> >> That's fine. That's a policy decision. That's what a security policy >> *is*. The owner of the system has decided, by security policy, that >> that is not allowed. Bypassing that is not acceptable. > > Fixing a bug should be acceptable. > That's not what you're trying to do, though. You're trying to change the behaviour underneath the security policy. If there is a bug, it's in the security policy and that's where it needs to be changed. > Look, let's back up a bit here. At a high level, what exactly do > you imagine that this behavior was intended for? I suggest you > list some examples of the attacks that are blocked. > > Can you come up with a reasonable argument that the current behavior > is the least painful restriction required to block those attacks? > Does the current behavior block any attack that the proposed behavior > would not? (list the attacks please) See above. -hpa ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-21 3:32 ` H. Peter Anvin @ 2007-06-21 7:38 ` Albert Cahalan 0 siblings, 0 replies; 28+ messages in thread From: Albert Cahalan @ 2007-06-21 7:38 UTC (permalink / raw) To: H. Peter Anvin; +Cc: William Lee Irwin III, linux-kernel On 6/20/07, H. Peter Anvin <hpa@zytor.com> wrote: > Albert Cahalan wrote: > > Look, let's back up a bit here. At a high level, what exactly do > > you imagine that this behavior was intended for? I suggest you > > list some examples of the attacks that are blocked. > > > > Can you come up with a reasonable argument that the current behavior > > is the least painful restriction required to block those attacks? > > Does the current behavior block any attack that the proposed behavior > > would not? (list the attacks please) > > See above. Nope. I asked you to justify the existing behavior. Apparently you are unable to do so. This should be a hint. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-20 16:01 ` William Lee Irwin III 2007-06-20 16:37 ` H. Peter Anvin @ 2007-06-20 18:43 ` Albert Cahalan 1 sibling, 0 replies; 28+ messages in thread From: Albert Cahalan @ 2007-06-20 18:43 UTC (permalink / raw) To: William Lee Irwin III; +Cc: linux-kernel On 6/20/07, William Lee Irwin III <wli@holomorphy.com> wrote: > On 6/19/07, William Lee Irwin III <wli@holomorphy.com> wrote: >>> If the policy forbidding self-modifying code lacks a method of >>> exempting programs such as JIT interpreters (which I doubt) then >>> it's a problem. I'm with Alan on this one. > > On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote: >> It does and it doesn't. There is not a reasonable way for a >> user to mark an app as needing full self-modifying ability. >> It's not like the executable stack, which can be set via the >> ELF note markings on the executable. (ELF note markings are >> ideal because they can not be used via a ret-to-libc attack) >> With admin privs, one can change SE Linux settings. Mark the >> executable, disable the protection system-wide, generate a >> completely new SE Linux policy, or just turn SE Linux off. >> Normally we don't expect/require admin privs to install an >> executable in one's own ~/bin directory. This is broken. >> It ought to be easier to get a JIT working well without >> enabling arbitrary mprotect. This would allow a JIT to >> partially benefit from the recent security enhancements. >> (think of all the buggy browser-based JIT things!) > > I presumed an ELF note or extended filesystem attributes were already > in place for this sort of affair. It may be that the model implemented > is so restrictive that users are forbidden to create new executables, > in which case using a different model is certainly in order. Otherwise > the ELF note or attributes need to be implemented. Users can create executables. Some will be non-functional unless specially marked by an admin. What is the goal here? I see no reasonable goal that would result in such a policy. > On 6/19/07, William Lee Irwin III <wli@holomorphy.com> wrote: >>> This sort of logic might be appropriate for a sort of parametrized >>> and specialized vma allocator setting the policy in /proc/ along >>> with various sorts of limits. There are limits to such and at some >>> point things will have to manually manage their own process address >>> spaces in a platform-specific fashion. If kernel assistance here is >>> rejected they may have to do so in all cases. > > On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote: >> I prefer ELF notes (for start-up allocations) and prctl, >> plus a mmap flag for per-allocation behavior. > > Beware that the kernel (upstream of me) will likely refuse to support > to exotic mmap() placement policies. At that point userspace will have > to implement them itself with a front-end to mmap(). > > Userspace can actually live without kernel placement support for > everything but the executable itself, which is already implemented via > ELF loading standards. This is not to downplay the tremendous amounts > of pain involved for moving the stack, getting ld.so to land in the > right place, and so on. Actually I'm less sure about .interp placement. > In any event, exotic virtualspace allocation policies are largely yet > another "simple matter of programming" implementable entirely in > userspace. When you go that route, you may need to abandon libc. I've done exactly that for one emulator. It was not easy. Nearly nobody will want to go down that path. Things improve a bit if MAP_ANONYMOUS and SysV shared mem allocations can be made to ignore the available memory checking. If I could allocate a 2 GB chunk on a system with 1 GB total swap+RAM, then I could use that as an area in which to perform MAP_FIXED allocations. As of now this would require either adding the swap space or disabling the available memory checking system-wide via sysctl. > On 6/19/07, William Lee Irwin III <wli@holomorphy.com> wrote: >>> This is a bad idea. The standard semantics are needed for programs >>> relying upon them. > > On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote: >> I didn't mean that the default default :-) setting would change. >> I meant that people could change the behavior from a boot script. >> Things that break are really foul and nasty anyway, probably with >> serious problems that ought to get fixed. > > It's actually not a good idea to make it the default even via sysctl. > People won't realize something will break until it does, and what will > break is likely to be a database responsible for data integrity. The > IPC_RMID creation flag should suffice. It's highly unlikely that such breakage would cause corruption. Most likely it would cause the database to exit with an error about failing to attach to a SysV shared memory segment. I believe that a major cause of reboots is that admins are unaware of SysV shared memory cruft left behind by apps that crashed at the wrong moment or had other bugs. If something is eating memory and you don't know what it is, you reboot. > On 6/19/07, William Lee Irwin III <wli@holomorphy.com> wrote: >>> This is MADV_REMOVE, though most filesystems don't support it. Do you >>> need it for more than tmpfs? > > On Tue, Jun 19, 2007 at 11:16:29PM -0400, Albert Cahalan wrote: >> Yes and no. It's painful to be restricted to one backing store. >> Covering MAP_ANONYMOUS and SysV shared mem is most critical. >> I suppose that other filesystems may require multiple flags to >> deal with the desire to (not) punch a hole on disk and what to >> do if that isn't possible. > > If those two are the bare necessities, they're already in place. Well NONE of this stuff is absolutely required to run a JIT, and one doesn't even need a JIT if one likes pure emulation. All of this is about optimization and failure clean-up. MAP_ANONYMOUS and SysV shared mem are good for transient things. Sometimes a JIT author wants to keep a persistent image on disk. In this case, it is much better to use the disk as backing store. Also, sometimes one prefers to use a specific filesystem because swap may be slower, smaller, or of unknown quality. BTW, a mdup2 is great for DSP algorithms as well. It can allow for wrap-around arrays, greatly simplifying and speeding up things like filters. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-19 15:08 ` William Lee Irwin III 2007-06-20 3:16 ` Albert Cahalan @ 2007-06-23 3:52 ` Kyle Moffett 2007-06-24 4:14 ` William Lee Irwin III 1 sibling, 1 reply; 28+ messages in thread From: Kyle Moffett @ 2007-06-23 3:52 UTC (permalink / raw) To: William Lee Irwin III; +Cc: Albert Cahalan, linux-kernel, Al Viro On Jun 19, 2007, at 11:08:24, William Lee Irwin III wrote: > On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: >> c. open() flag to unlink a file before returning the fd > > You probably want a tmpfile(3) -like affair which never has a > pathname to begin with. It could be useful for security purposes > more generally. maybe this: open("/some/dir", O_TMPFILE); and this? open("/some/dir", O_TMPFILE|O_DIRECTORY); The former would return a filehandle to a new anonymous file somewhere on whatever filesystem backs the specified path. The latter would do the same, except create an anonymous directory where you could use "openat()" or something. Presumably "lsof" and "/proc" should show either type of handle as referring to either "/some/ filesystem/" or "/some/filesystem/ (anonymous temp file)" or something. Cheers, Kyle Moffett ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-23 3:52 ` Kyle Moffett @ 2007-06-24 4:14 ` William Lee Irwin III 0 siblings, 0 replies; 28+ messages in thread From: William Lee Irwin III @ 2007-06-24 4:14 UTC (permalink / raw) To: Kyle Moffett; +Cc: Albert Cahalan, linux-kernel, Al Viro On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: >>> c. open() flag to unlink a file before returning the fd On Jun 19, 2007, at 11:08:24, William Lee Irwin III wrote: >> You probably want a tmpfile(3) -like affair which never has a >> pathname to begin with. It could be useful for security purposes >> more generally. On Fri, Jun 22, 2007 at 11:52:12PM -0400, Kyle Moffett wrote: > maybe this: open("/some/dir", O_TMPFILE); > and this? open("/some/dir", O_TMPFILE|O_DIRECTORY); > The former would return a filehandle to a new anonymous file > somewhere on whatever filesystem backs the specified path. The > latter would do the same, except create an anonymous directory where > you could use "openat()" or something. Presumably "lsof" and "/proc" > should show either type of handle as referring to either "/some/ > filesystem/" or "/some/filesystem/ (anonymous temp file)" or something. This is plausible (and I did indeed consider the file variant), though it may require more infrastructure than for tmpfs only. It may be worth clarifying that I have no concrete plans to work on the JIT emulator issues myself. I'm only disseminating ideas I think will pass review. I expect others to take up the issue(s) perhaps with some inspiration from what I described. I may review some, but I have a large review backlog as things now stand. -- wli ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-08 6:35 JIT emulator needs Albert Cahalan ` (3 preceding siblings ...) 2007-06-19 15:08 ` William Lee Irwin III @ 2007-06-21 17:44 ` Arjan van de Ven 2007-06-22 5:56 ` Albert Cahalan 4 siblings, 1 reply; 28+ messages in thread From: Arjan van de Ven @ 2007-06-21 17:44 UTC (permalink / raw) To: Albert Cahalan; +Cc: linux-kernel On Fri, 2007-06-08 at 02:35 -0400, Albert Cahalan wrote: > Right now, Linux isn't all that friendly to JIT emulators. > Here are the problems and suggestions to improve the situation. > > There is an SE Linux execmem restriction that enforces W^X. > Assuming you don't wish to just disable SE Linux, there are > two ugly ways around the problem. You can mmap a file twice, > or you can abuse SysV shared memory. The mmap method requires > that you know of a filesystem mounted rw,exec where you can > write a very large temporary file. This arbitrary filesystem, > rather than swap space, will be the backing store. The SysV > shared memory method requires an undocumented flag and is > subject to some annoying size limits. Both methods create > objects that will fail to be deleted if the program dies > before marking the objects for deletion. and these methods also destroy yourself on any machine with a looser cache coherency between I and D-cache.... for all but x86 you pretty much have to do the mprotect() between the two states to deal with the cache flushing properly... ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-21 17:44 ` Arjan van de Ven @ 2007-06-22 5:56 ` Albert Cahalan 2007-06-22 13:43 ` Arjan van de Ven 0 siblings, 1 reply; 28+ messages in thread From: Albert Cahalan @ 2007-06-22 5:56 UTC (permalink / raw) To: Arjan van de Ven; +Cc: linux-kernel On 6/21/07, Arjan van de Ven <arjan@infradead.org> wrote: > On Fri, 2007-06-08 at 02:35 -0400, Albert Cahalan wrote: > > Right now, Linux isn't all that friendly to JIT emulators. > > Here are the problems and suggestions to improve the situation. > > > > There is an SE Linux execmem restriction that enforces W^X. > > Assuming you don't wish to just disable SE Linux, there are > > two ugly ways around the problem. You can mmap a file twice, > > or you can abuse SysV shared memory. The mmap method requires > > that you know of a filesystem mounted rw,exec where you can > > write a very large temporary file. This arbitrary filesystem, > > rather than swap space, will be the backing store. The SysV > > shared memory method requires an undocumented flag and is > > subject to some annoying size limits. Both methods create > > objects that will fail to be deleted if the program dies > > before marking the objects for deletion. > > and these methods also destroy yourself on any machine with a looser > cache coherency between I and D-cache.... > > for all but x86 you pretty much have to do the mprotect() between the > two states to deal with the cache flushing properly... If the instructions to force data write-back and/or to invalidate the instruction cache are priveleged, yes. AFAIK, only ARM is that lame. For example, PowerPC lets unprivileged code run the required instructions. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-22 5:56 ` Albert Cahalan @ 2007-06-22 13:43 ` Arjan van de Ven 2007-06-22 14:32 ` Albert Cahalan 0 siblings, 1 reply; 28+ messages in thread From: Arjan van de Ven @ 2007-06-22 13:43 UTC (permalink / raw) To: Albert Cahalan; +Cc: linux-kernel On Fri, 2007-06-22 at 01:56 -0400, Albert Cahalan wrote: > On 6/21/07, Arjan van de Ven <arjan@infradead.org> wrote: > > On Fri, 2007-06-08 at 02:35 -0400, Albert Cahalan wrote: > > > Right now, Linux isn't all that friendly to JIT emulators. > > > Here are the problems and suggestions to improve the situation. > > > > > > There is an SE Linux execmem restriction that enforces W^X. > > > Assuming you don't wish to just disable SE Linux, there are > > > two ugly ways around the problem. You can mmap a file twice, > > > or you can abuse SysV shared memory. The mmap method requires > > > that you know of a filesystem mounted rw,exec where you can > > > write a very large temporary file. This arbitrary filesystem, > > > rather than swap space, will be the backing store. The SysV > > > shared memory method requires an undocumented flag and is > > > subject to some annoying size limits. Both methods create > > > objects that will fail to be deleted if the program dies > > > before marking the objects for deletion. > > > > and these methods also destroy yourself on any machine with a looser > > cache coherency between I and D-cache.... > > > > for all but x86 you pretty much have to do the mprotect() between the > > two states to deal with the cache flushing properly... > > If the instructions to force data write-back and/or to > invalidate the instruction cache are priveleged, yes. > AFAIK, only ARM is that lame. and your program executes this on all the cpus in the system? -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-22 13:43 ` Arjan van de Ven @ 2007-06-22 14:32 ` Albert Cahalan 2007-06-22 14:42 ` Arjan van de Ven 0 siblings, 1 reply; 28+ messages in thread From: Albert Cahalan @ 2007-06-22 14:32 UTC (permalink / raw) To: Arjan van de Ven; +Cc: linux-kernel On 6/22/07, Arjan van de Ven <arjan@infradead.org> wrote: > On Fri, 2007-06-22 at 01:56 -0400, Albert Cahalan wrote: > > On 6/21/07, Arjan van de Ven <arjan@infradead.org> wrote: > > > On Fri, 2007-06-08 at 02:35 -0400, Albert Cahalan wrote: > > > > Right now, Linux isn't all that friendly to JIT emulators. > > > > Here are the problems and suggestions to improve the situation. > > > > > > > > There is an SE Linux execmem restriction that enforces W^X. > > > > Assuming you don't wish to just disable SE Linux, there are > > > > two ugly ways around the problem. You can mmap a file twice, > > > > or you can abuse SysV shared memory. The mmap method requires > > > > that you know of a filesystem mounted rw,exec where you can > > > > write a very large temporary file. This arbitrary filesystem, > > > > rather than swap space, will be the backing store. The SysV > > > > shared memory method requires an undocumented flag and is > > > > subject to some annoying size limits. Both methods create > > > > objects that will fail to be deleted if the program dies > > > > before marking the objects for deletion. > > > > > > and these methods also destroy yourself on any machine with a looser > > > cache coherency between I and D-cache.... > > > > > > for all but x86 you pretty much have to do the mprotect() between the > > > two states to deal with the cache flushing properly... > > > > If the instructions to force data write-back and/or to > > invalidate the instruction cache are priveleged, yes. > > AFAIK, only ARM is that lame. > > and your program executes this on all the cpus in the system? I'll remember that if I ever run a JIT on the SMP ARM box. (there's like one, at the manufacturer, right?) I don't recall seeing such code in the libgcc tranpoline setup for PowerPC. Either it's not required, or this is a rather popular bug. Perhaps ARM needs syscalls for this, or emulation for the privileged instructions. This may already exist; it sure is required. So this would be another need for properly supporting JIT emulators. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-22 14:32 ` Albert Cahalan @ 2007-06-22 14:42 ` Arjan van de Ven 2007-06-23 2:30 ` Albert Cahalan 0 siblings, 1 reply; 28+ messages in thread From: Arjan van de Ven @ 2007-06-22 14:42 UTC (permalink / raw) To: Albert Cahalan; +Cc: linux-kernel > > > > and these methods also destroy yourself on any machine with a looser > > > > cache coherency between I and D-cache.... > > > > > > > > for all but x86 you pretty much have to do the mprotect() between the > > > > two states to deal with the cache flushing properly... > > > > > > If the instructions to force data write-back and/or to > > > invalidate the instruction cache are priveleged, yes. > > > AFAIK, only ARM is that lame. > > > > and your program executes this on all the cpus in the system? no I meant that you had to call your userspace instruction on all cpus, so on all-but-arm (from the Intel side I know IA64 needs such a flush, but I'm pretty sure PPC does too) > I don't recall seeing such code in the libgcc tranpoline > setup for PowerPC. Either it's not required, or this is > a rather popular bug. I suspect it'll be playing under the assumption that going from "no code" to "code" is fine since the icache is cold. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: JIT emulator needs 2007-06-22 14:42 ` Arjan van de Ven @ 2007-06-23 2:30 ` Albert Cahalan 0 siblings, 0 replies; 28+ messages in thread From: Albert Cahalan @ 2007-06-23 2:30 UTC (permalink / raw) To: Arjan van de Ven; +Cc: linux-kernel On 6/22/07, Arjan van de Ven <arjan@infradead.org> wrote: > > > > > and these methods also destroy yourself on any machine with a looser > > > > > cache coherency between I and D-cache.... > > > > > > > > > > for all but x86 you pretty much have to do the mprotect() between the > > > > > two states to deal with the cache flushing properly... > > > > > > > > If the instructions to force data write-back and/or to > > > > invalidate the instruction cache are priveleged, yes. > > > > AFAIK, only ARM is that lame. > > > > > > and your program executes this on all the cpus in the system? > > no I meant that you had to call your userspace instruction on all cpus, > so on all-but-arm (from the Intel side I know IA64 needs such a flush, > but I'm pretty sure PPC does too) I understood. AFAIK, it is common to propagate this via a special bus cycle. Section 5.1.5.2.1 of the PowerPC manual states that this is so. Secion 5.1.5.2 lists the requirements for both uniprocessor and multiprocessor. Note that Linux uses the coherent memory model for PowerPC SMP. See also the "icbi" instruction description, where the use of an address-only broadcast is mentioned. > > I don't recall seeing such code in the libgcc tranpoline > > setup for PowerPC. Either it's not required, or this is > > a rather popular bug. > > I suspect it'll be playing under the assumption that going from "no > code" to "code" is fine since the icache is cold. A previous trampoline would ruin that. Fortunately, PowerPC is not as brain-dead as ARM and IA64. (not that I'm writing code for any of these) ^ permalink raw reply [flat|nested] 28+ messages in thread
[parent not found: <8tGiE-2Hv-1@gated-at.bofh.it>]
[parent not found: <8xNvm-2Tw-29@gated-at.bofh.it>]
[parent not found: <8xYTM-3So-13@gated-at.bofh.it>]
* Re: JIT emulator needs [not found] ` <8xYTM-3So-13@gated-at.bofh.it> @ 2007-06-21 11:08 ` Bodo Eggert 0 siblings, 0 replies; 28+ messages in thread From: Bodo Eggert @ 2007-06-21 11:08 UTC (permalink / raw) To: Albert Cahalan, William Lee Irwin III, linux-kernel Albert Cahalan <acahalan@gmail.com> wrote: > On 6/19/07, William Lee Irwin III <wli@holomorphy.com> wrote: >> On Fri, Jun 08, 2007 at 02:35:22AM -0400, Albert Cahalan wrote: >>> Right now, Linux isn't all that friendly to JIT emulators. >>> Here are the problems and suggestions to improve the situation. >>> There is an SE Linux execmem restriction that enforces W^X. >>> Assuming you don't wish to just disable SE Linux, there are >>> two ugly ways around the problem. You can mmap a file twice, >>> or you can abuse SysV shared memory. The mmap method requires >>> that you know of a filesystem mounted rw,exec where you can >>> write a very large temporary file. This arbitrary filesystem, >>> rather than swap space, will be the backing store. The SysV >>> shared memory method requires an undocumented flag and is >>> subject to some annoying size limits. Both methods create >>> objects that will fail to be deleted if the program dies >>> before marking the objects for deletion. >> >> If the policy forbidding self-modifying code lacks a method of >> exempting programs such as JIT interpreters (which I doubt) then >> it's a problem. I'm with Alan on this one. > > It does and it doesn't. There is not a reasonable way for a > user to mark an app as needing full self-modifying ability. > It's not like the executable stack, which can be set via the > ELF note markings on the executable. (ELF note markings are > ideal because they can not be used via a ret-to-libc attack) > > With admin privs, one can change SE Linux settings. Mark the > executable, disable the protection system-wide, generate a > completely new SE Linux policy, or just turn SE Linux off. According to the documents I found about SELinux, you can also - create a this-app-needs-selfmodification type - allow users to change the context type of their files to this type - configure a domain to allow self-modification - configure the domain transition Brave words from someone who did not yet successfully find the magic in order to install the refpolicy on debilian (after finding their refpolicy-foo to be incomplete and their refpolicy-src to not compile). -- Why do women have smaller feet than men? It's one of those "evolutionary things" that allows them to stand closer to the kitchen sink. Friß, Spammer: Jy@jRwxq.7eggert.dyndns.org d-afnhbe@p9J.7eggert.dyndns.org ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2007-06-24 4:13 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-06-08 6:35 JIT emulator needs Albert Cahalan 2007-06-08 7:09 ` Eric Dumazet 2007-06-09 4:12 ` Albert Cahalan 2007-06-08 11:10 ` Alan Cox 2007-06-08 16:35 ` Nicholas Miell 2007-06-09 5:17 ` Albert Cahalan 2007-06-09 20:00 ` H. Peter Anvin 2007-06-19 15:08 ` William Lee Irwin III 2007-06-20 3:16 ` Albert Cahalan 2007-06-20 16:01 ` William Lee Irwin III 2007-06-20 16:37 ` H. Peter Anvin 2007-06-20 17:54 ` William Lee Irwin III 2007-06-20 18:23 ` H. Peter Anvin 2007-06-20 18:25 ` Albert Cahalan 2007-06-20 18:51 ` H. Peter Anvin 2007-06-21 3:21 ` Albert Cahalan 2007-06-21 3:32 ` H. Peter Anvin 2007-06-21 7:38 ` Albert Cahalan 2007-06-20 18:43 ` Albert Cahalan 2007-06-23 3:52 ` Kyle Moffett 2007-06-24 4:14 ` William Lee Irwin III 2007-06-21 17:44 ` Arjan van de Ven 2007-06-22 5:56 ` Albert Cahalan 2007-06-22 13:43 ` Arjan van de Ven 2007-06-22 14:32 ` Albert Cahalan 2007-06-22 14:42 ` Arjan van de Ven 2007-06-23 2:30 ` Albert Cahalan [not found] <8tGiE-2Hv-1@gated-at.bofh.it> [not found] ` <8xNvm-2Tw-29@gated-at.bofh.it> [not found] ` <8xYTM-3So-13@gated-at.bofh.it> 2007-06-21 11:08 ` Bodo Eggert
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).