* Re: CD writing in future Linux (stirring up a hornets' nest) (was: Rationale for RLIMIT_MEMLOCK?) [not found] ` <5yi2X-zm-7@gated-at.bofh.it> @ 2006-01-24 9:14 ` Bodo Eggert 2006-01-24 14:38 ` Joerg Schilling 0 siblings, 1 reply; 34+ messages in thread From: Bodo Eggert @ 2006-01-24 9:14 UTC (permalink / raw) To: Joerg Schilling, rlrevell, matthias.andree, schilling, linux-kernel Joerg Schilling <schilling@fokus.fraunhofer.de> wrote: [...] > On Solaris, you (currently) use a profile enabled shell (pfsh, pfksh or pfcsh) > that calls getexecuser() in order to find whether there is a specific > treatment needed. If this specific treatment is needed, then the shell calls > execve(/usr/bin/pfexec cmd <args>) > else it calls execve(cmd <args>) > > I did recently voted to require all shells to be profile enabled by default. Why? I asume there will only be few programs requiring to be run by a wrapper, and mv /usr/bin/foo to /usr/pfexec-bin/foo; echo $'#!/bin/sh\n/usr/sbin/pfexec /usr/pfexec-bin/foo "$@"' > /usr/bin/foo; chmod 755 /usr/bin/foo should be easier than patching e.g. all callers of cdrecord, and it won't slow down starting non-profiled applications. Possibly the pfexec can tell the application to be run by the basename (like su1), in this case you'd add something like "alias cdrecord /opt/schily/bin/cdrecord" to it's configuration and link it to /usr/bin/cdrecord. > With the future plans for extending fine grained privs on Solaris, sending > SCSI commands will become more than one priv. > > I proposed to have a low priv right to send commands like inquiry and test > unit ready. These commands may e.g. be send without interfering a concurrent > CD/DVD write operation. > > The next priv could be the permission for sending simple SCSI commands that > allow reading from the device. > > The next priv could be the permission for sending simple SCSI Commands that > allow writing. > > The final priv would allow even vendor specific commands: this is what > cdrecord needs. That sounds reasonable, but I wonder how you can get access to a device file descriptor in order to do unprivileged access. -- Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF verbreiteten Lügen zu sabotieren. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CD writing in future Linux (stirring up a hornets' nest) (was: Rationale for RLIMIT_MEMLOCK?) 2006-01-24 9:14 ` CD writing in future Linux (stirring up a hornets' nest) (was: Rationale for RLIMIT_MEMLOCK?) Bodo Eggert @ 2006-01-24 14:38 ` Joerg Schilling 2006-01-24 17:44 ` CD writing in future Linux (stirring up a hornets' nest) Bodo Eggert 0 siblings, 1 reply; 34+ messages in thread From: Joerg Schilling @ 2006-01-24 14:38 UTC (permalink / raw) To: schilling, rlrevell, matthias.andree, linux-kernel, 7eggert Bodo Eggert <harvested.in.lkml@7eggert.dyndns.org> wrote: > Joerg Schilling <schilling@fokus.fraunhofer.de> wrote: > > [...] > > On Solaris, you (currently) use a profile enabled shell (pfsh, pfksh or pfcsh) > > that calls getexecuser() in order to find whether there is a specific > > treatment needed. If this specific treatment is needed, then the shell calls > > execve(/usr/bin/pfexec cmd <args>) > > else it calls execve(cmd <args>) > > > > I did recently voted to require all shells to be profile enabled by default. > > Why? I asume there will only be few programs requiring to be run by a > wrapper, and mv /usr/bin/foo to /usr/pfexec-bin/foo; > echo $'#!/bin/sh\n/usr/sbin/pfexec /usr/pfexec-bin/foo "$@"' > /usr/bin/foo; > chmod 755 /usr/bin/foo > should be easier than patching e.g. all callers of cdrecord, and it won't > slow down starting non-profiled applications. Because the architecture review commitee decided this would be the right way. Note that we are on a migration from the classical root/non-root UNIX to a fine grained privileges handling. The current documentation says that you need to have a profile enabled shell as your SHELL in order to be able to use a root-less Solaris. > Possibly the pfexec can tell the application to be run by the basename (like > su1), in this case you'd add something like > "alias cdrecord /opt/schily/bin/cdrecord" to it's configuration and link it > to /usr/bin/cdrecord. But you are right that another way would be to use something like "isaexec" > > The final priv would allow even vendor specific commands: this is what > > cdrecord needs. > > That sounds reasonable, but I wonder how you can get access to a device > file descriptor in order to do unprivileged access. This is something that needs to be discussed. Last night, I found that there should be a way to run cdrecord without the need to have the "file_dac_read" provilege. I'll discuss this with the security group. Jörg -- EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin js@cs.tu-berlin.de (uni) schilling@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: CD writing in future Linux (stirring up a hornets' nest) 2006-01-24 14:38 ` Joerg Schilling @ 2006-01-24 17:44 ` Bodo Eggert 0 siblings, 0 replies; 34+ messages in thread From: Bodo Eggert @ 2006-01-24 17:44 UTC (permalink / raw) To: Joerg Schilling; +Cc: rlrevell, matthias.andree, linux-kernel, 7eggert On Tue, 24 Jan 2006, Joerg Schilling wrote: > Bodo Eggert <harvested.in.lkml@7eggert.dyndns.org> wrote: > > Joerg Schilling <schilling@fokus.fraunhofer.de> wrote: > > [...] > > > On Solaris, you (currently) use a profile enabled shell (pfsh, pfksh or pfcsh) > > > that calls getexecuser() in order to find whether there is a specific > > > treatment needed. If this specific treatment is needed, then the shell calls > > > execve(/usr/bin/pfexec cmd <args>) > > > else it calls execve(cmd <args>) > > > > > > I did recently voted to require all shells to be profile enabled by default. > > > > Why? I asume there will only be few programs requiring to be run by a > > wrapper, and mv /usr/bin/foo to /usr/pfexec-bin/foo; > > echo $'#!/bin/sh\n/usr/sbin/pfexec /usr/pfexec-bin/foo "$@"' > /usr/bin/foo; > > chmod 755 /usr/bin/foo > > should be easier than patching e.g. all callers of cdrecord, and it won't > > slow down starting non-profiled applications. > > Because the architecture review commitee decided this would be the right way. > > Note that we are on a migration from the classical root/non-root UNIX to a fine > grained privileges handling. The current documentation says that you need to > have a profile enabled shell as your SHELL in order to be able to use a > root-less Solaris. If the shell was the only program calling cdrecord, this would work out as expected. -- My mail reader can beat up your mail reader. ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <5ygDT-6LK-3@gated-at.bofh.it>]
[parent not found: <5yscc-68j-5@gated-at.bofh.it>]
[parent not found: <5ysvk-6JI-5@gated-at.bofh.it>]
[parent not found: <5ysvk-6JI-3@gated-at.bofh.it>]
[parent not found: <5yEn7-7Or-21@gated-at.bofh.it>]
[parent not found: <5yUUI-6JR-15@gated-at.bofh.it>]
* Re: Rationale for RLIMIT_MEMLOCK? [not found] ` <5yUUI-6JR-15@gated-at.bofh.it> @ 2006-01-26 0:12 ` Bodo Eggert 0 siblings, 0 replies; 34+ messages in thread From: Bodo Eggert @ 2006-01-26 0:12 UTC (permalink / raw) To: Joerg Schilling, schilling, matthias.andree, linux-kernel, tytso, arjan Joerg Schilling <schilling@fokus.fraunhofer.de> wrote: > I could add this piece of code to the euid == 0 part of cdrecord: > > LOCAL void > raise_memlock() > { > #ifdef RLIMIT_MEMLOCK > struct rlimit rlim; > > rlim.rlim_cur = rlim.rlim_max = RLIM_INFINITY; I think you should rather use the size you're going to mlock, or at least the upper bound. -- Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF verbreiteten Lügen zu sabotieren. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? @ 2006-02-03 20:49 Michael Kerrisk 0 siblings, 0 replies; 34+ messages in thread From: Michael Kerrisk @ 2006-02-03 20:49 UTC (permalink / raw) To: matthias.andree Cc: Theodore Ts'o, linux-kernel, arjan, Joerg Schilling, michael.kerrisk > > Matthias Andree <matthias.andree@gmx.de> wrote: [...] > The complete story is, condensed, and with return values, for a > setuid-root application: > > geteuid() == 0; > mlockall(MLC_CURRENT|MLC_FUTURE) == (success); > seteuid(500) == (success); > valloc(64512 + pagesize) == NULL (failure); [...] A late follow-up to this thread. I've added the following text to the mlockall() manual pag under BUGS: Since kernel 2.6.9, if a privileged process calls mlockall(MCL_FUTURE) and later drops privileges (CAP_IPC_LOCK), then subsequent memory allocations (e.g., mmap(2), sbrk(2)) will fail if the RLIMIT_MEMLOCK resource limit is encountered. The change will be in man-pages 2.23. Cheers, Michael -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at ftp://ftp.win.tue.nl/pub/linux-local/manpages/, read the HOWTOHELP file and grep the source files for 'FIXME'. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Rationale for RLIMIT_MEMLOCK? @ 2006-01-23 10:56 Matthias Andree 2006-01-23 11:05 ` Arjan van de Ven 0 siblings, 1 reply; 34+ messages in thread From: Matthias Andree @ 2006-01-23 10:56 UTC (permalink / raw) To: Linux-Kernel mailing list Greetings, debugging an application problem that used to mlockall(...FUTURE) and failed with a subsequent mmap(), I came across the manual page for setrlimit (see below for the relevant excerpt). I have several questions concerning the rationale: 1. What is the reason we're having special treatment for the super-user here? 2. Why is it the opposite of what 2.6.8.1 and earlier did? 3. Why is this inconsistent with all other RLIMIT_*? Neither of which cares if a process is privileged or not. 4. Is the default hard limit of 32 kB initialized by the kernel or by some script in SUSE 10.0? If it's the kernel: why is the limit so low, and why isn't just the soft limit set? "[...] RLIMIT_MEMLOCK The maximum number of bytes of memory that may be locked into RAM. In effect this limit is rounded down to the nearest multi- ple of the system page size. This limit affects mlock(2) and mlockall(2) and the mmap(2) MAP_LOCKED operation. Since Linux 2.6.9 it also affects the shmctl(2) SHM_LOCK operation, where it sets a maximum on the total bytes in shared memory segments (see shmget(2)) that may be locked by the real user ID of the calling process. The shmctl(2) SHM_LOCK locks are accounted for sepa- rately from the per-process memory locks established by mlock(2), mlockall(2), and mmap(2) MAP_LOCKED; a process can lock bytes up to this limit in each of these two categories. In Linux kernels before 2.6.9, this limit controlled the amount of memory that could be locked by a privileged process. Since Linux 2.6.9, no limits are placed on the amount of memory that a privileged process may lock, and this limit instead governs the amount of memory that an unprivileged process may lock. [...]" (getrlimit(2), man-pages-2.07) -- Matthias Andree ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-23 10:56 Matthias Andree @ 2006-01-23 11:05 ` Arjan van de Ven 2006-01-23 16:54 ` Matthias Andree 0 siblings, 1 reply; 34+ messages in thread From: Arjan van de Ven @ 2006-01-23 11:05 UTC (permalink / raw) To: Matthias Andree; +Cc: Linux-Kernel mailing list ` > > 1. What is the reason we're having special treatment > for the super-user here? it's quite common to allow root (or more specific, the right capability) to override rlimits. Many such security check behave that way so it's only "just" to treat this one like that as well. > 2. Why is it the opposite of what 2.6.8.1 and earlier did? the earlier behavior didn't really make sense, and gave cause to multimedia apps running as root only to be able to mlock etc etc. Now this can be dynamically controlled instead. > 4. Is the default hard limit of 32 kB initialized by the kernel or the kernel has a relatively low default. The reason is simple: allow too much mlock and the user can DoS the machine too easy. The kernel default should be safe, the admin / distro can very easily override anyway. You may ask: why is it not zero? It is very useful for many things to have a "small" mlock area. gpg, ssh and basically anything that works with keys and passwords. Small relative to the other resources such a process takes (eg kernel stacks etc). ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-23 11:05 ` Arjan van de Ven @ 2006-01-23 16:54 ` Matthias Andree 2006-01-23 17:00 ` Arjan van de Ven 0 siblings, 1 reply; 34+ messages in thread From: Matthias Andree @ 2006-01-23 16:54 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Linux-Kernel mailing list, mtk-manpages On Mon, 23 Jan 2006, Arjan van de Ven wrote: > ` > > > > 1. What is the reason we're having special treatment > > for the super-user here? > > it's quite common to allow root (or more specific, the right capability) > to override rlimits. Many such security check behave that way so it's > only "just" to treat this one like that as well. Why is RLIMIT_MEMLOCK special enough to warrant special treatment like this? The right capability should be able to override with setrlimit(2) anyways, right? > > 2. Why is it the opposite of what 2.6.8.1 and earlier did? > > the earlier behavior didn't really make sense, and gave cause to > multimedia apps running as root only to be able to mlock etc etc. Now > this can be dynamically controlled instead. Quoting the manpage: "In Linux kernels before 2.6.9, this limit controlled the amount of memory that could be locked by a privileged process." This is nonsense, and it appears as though 2.6.8 and earlier didn't apply the limit to unprivileged processes. Should the behavior stay as inconsistent as it's now, I'd suggest to reword this to "...before 2.6.9, this limit controlled the amount of memory that could be locked by /any/ process." or something even better if someone can think of such. (manpages maintainer Cc'd) > > 4. Is the default hard limit of 32 kB initialized by the kernel or > > the kernel has a relatively low default. The reason is simple: allow too > much mlock and the user can DoS the machine too easy. The kernel default > should be safe, the admin / distro can very easily override anyway. This doesn't appear to happen for SUSE 10.0, which causes trouble with some of the "multimedia apps" BTW... apparently the limit was lowered at the same time as the root restrictions were relaxed. Such changes in behavior aren't adequate for 2.6.X, there are way too many applications that can't be bothered to check the patchlevel of the kernel, and it's totally unintuitive to users, too. Aside from the fact that most distros have settled on one kernel. > You may ask: why is it not zero? No, I'm not doing that. I rather wonder why it's so low, or whom a certain percentage such as RAM >> 5 (that's 3.125 %) would hurt. Allowing unlimited memory allocation while at the same time allowing only 32 kB of mlock()ed memory seems disproportionate to me. -- Matthias Andree ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-23 16:54 ` Matthias Andree @ 2006-01-23 17:00 ` Arjan van de Ven 2006-01-23 18:01 ` Matthias Andree 0 siblings, 1 reply; 34+ messages in thread From: Arjan van de Ven @ 2006-01-23 17:00 UTC (permalink / raw) To: Matthias Andree; +Cc: Linux-Kernel mailing list, mtk-manpages > > > 4. Is the default hard limit of 32 kB initialized by the kernel or > > > > the kernel has a relatively low default. The reason is simple: allow too > > much mlock and the user can DoS the machine too easy. The kernel default > > should be safe, the admin / distro can very easily override anyway. > > This doesn't appear to happen for SUSE 10.0, which causes trouble with > some of the "multimedia apps" BTW... apparently the limit was lowered at > the same time as the root restrictions were relaxed. yes the behavior is like this root non-root before about half of ram nothing after all of ram by default small, increasable > Such changes in behavior aren't adequate for 2.6.X, there are way too > many applications that can't be bothered to check the patchlevel of the > kernel, and it's totally unintuitive to users, too. there is NO fundamental change here other than a *general* relaxing. This is important to note: Apps that could mlock before STILL can mlock. Only apps that would depend on mlock failing with a security check, and only those who do small portions, break now because suddenly the mlock succeeds. Big deal... those would have broken when run as root already > No, I'm not doing that. I rather wonder why it's so low, or whom a certain > percentage such as RAM >> 5 (that's 3.125 %) would hurt. A because it's generally a PER PROCESS limit, so fork 60 times and kaboom things explode. (You can argue you can forkbomb anyway, but that's where the process count rlimit comes in) > Allowing > unlimited memory allocation while at the same time allowing only 32 kB > of mlock()ed memory seems disproportionate to me. it's not. Normal memory is swapable. And thus a far less rare commodity than precious pinned down memory. What application do you have in mind that broke by this relaxing of rules? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-23 17:00 ` Arjan van de Ven @ 2006-01-23 18:01 ` Matthias Andree 2006-01-23 18:13 ` Arjan van de Ven 0 siblings, 1 reply; 34+ messages in thread From: Matthias Andree @ 2006-01-23 18:01 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Linux-Kernel mailing list On Mon, 23 Jan 2006, Arjan van de Ven wrote: > yes the behavior is like this > > root non-root > before about half of ram nothing > after all of ram by default small, increasable > [...] > What application do you have in mind that broke by this relaxing of > rules? This is not something I'd like to disclose here yet. It is an application that calls mlockall(MCL_CURRENT|MCL_FUTURE) and apparently copes with mlockall() returning EPERM (or doesn't even try it) but can apparently NOT cope with valign() tripping over mmap() == -1/EAGAIN. The relevant people are Bcc:d. -- Matthias Andree ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-23 18:01 ` Matthias Andree @ 2006-01-23 18:13 ` Arjan van de Ven 2006-01-23 18:55 ` Matthias Andree 0 siblings, 1 reply; 34+ messages in thread From: Arjan van de Ven @ 2006-01-23 18:13 UTC (permalink / raw) To: Matthias Andree; +Cc: Linux-Kernel mailing list On Mon, 2006-01-23 at 19:01 +0100, Matthias Andree wrote: > On Mon, 23 Jan 2006, Arjan van de Ven wrote: > > > yes the behavior is like this > > > > root non-root > > before about half of ram nothing > > after all of ram by default small, increasable > > [...] > > What application do you have in mind that broke by this relaxing of > > rules? > > This is not something I'd like to disclose here yet. > > It is an application that calls mlockall(MCL_CURRENT|MCL_FUTURE) and > apparently copes with mlockall() returning EPERM hmm... curious that mlockall() succeeds with only a 32kb rlimit.... ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-23 18:13 ` Arjan van de Ven @ 2006-01-23 18:55 ` Matthias Andree 2006-01-23 19:04 ` Arjan van de Ven ` (3 more replies) 0 siblings, 4 replies; 34+ messages in thread From: Matthias Andree @ 2006-01-23 18:55 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Linux-Kernel mailing list On Mon, 23 Jan 2006, Arjan van de Ven wrote: > hmm... curious that mlockall() succeeds with only a 32kb rlimit.... It's quite obvious with the seteuid() shuffling behind the scenes of the app, for the mlockall() runs with euid==0, and the later mmap() with euid!=0. Clearly the application should do both with the same privilege or raise the RLIMIT_MEMLOCK while running with privileges. The question that's open is one for the libc guys: malloc(), valloc() and others seem to use mmap() on some occasions (for some allocation sizes) - at least malloc/malloc.c comments as of 2.3.4 suggest so -, and if this isn't orthogonal to mlockall() and set[e]uid() calls, the glibc is pretty deeply in trouble if the code calls mlockall(MLC_FUTURE) and then drops privileges. The function in question appears to be valloc() with glibc 2.3.5. In this light, mlockall(MCL_FUTURE) is pretty useless, since there is no way to undo MCL_FUTURE without unlocking all pages at the same time. Particularly so for setuid apps... I'm asking the Bcc'd gentleman to reconsider mlockall() and perhaps use explicit mlock() instead. -- Matthias Andree ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-23 18:55 ` Matthias Andree @ 2006-01-23 19:04 ` Arjan van de Ven 2006-01-23 19:38 ` Joerg Schilling ` (2 subsequent siblings) 3 siblings, 0 replies; 34+ messages in thread From: Arjan van de Ven @ 2006-01-23 19:04 UTC (permalink / raw) To: Matthias Andree; +Cc: Linux-Kernel mailing list On Mon, 2006-01-23 at 19:55 +0100, Matthias Andree wrote: > On Mon, 23 Jan 2006, Arjan van de Ven wrote: > > > hmm... curious that mlockall() succeeds with only a 32kb rlimit.... > > It's quite obvious with the seteuid() shuffling behind the scenes of the > app, for the mlockall() runs with euid==0, and the later mmap() with euid!=0. hmm how on earth was that supposed to work at all???? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-23 18:55 ` Matthias Andree 2006-01-23 19:04 ` Arjan van de Ven @ 2006-01-23 19:38 ` Joerg Schilling 2006-01-23 20:30 ` Matthias Andree 2006-01-23 20:30 ` Lee Revell 2006-01-23 19:57 ` Lee Revell 2006-01-23 21:34 ` Theodore Ts'o 3 siblings, 2 replies; 34+ messages in thread From: Joerg Schilling @ 2006-01-23 19:38 UTC (permalink / raw) To: matthias.andree, arjan; +Cc: linux-kernel Matthias Andree <matthias.andree@gmx.de> wrote: > On Mon, 23 Jan 2006, Arjan van de Ven wrote: > > > hmm... curious that mlockall() succeeds with only a 32kb rlimit.... > > It's quite obvious with the seteuid() shuffling behind the scenes of the > app, for the mlockall() runs with euid==0, and the later mmap() with euid!=0. > > Clearly the application should do both with the same privilege or raise > the RLIMIT_MEMLOCK while running with privileges. > > The question that's open is one for the libc guys: malloc(), valloc() > and others seem to use mmap() on some occasions (for some allocation > sizes) - at least malloc/malloc.c comments as of 2.3.4 suggest so -, and > if this isn't orthogonal to mlockall() and set[e]uid() calls, the glibc > is pretty deeply in trouble if the code calls mlockall(MLC_FUTURE) and > then drops privileges. If the behavior described by Matthias is true for current Linuc kernels, then there is a clean bug that needs fixing. If the Linux kernel is not willing to accept the contract by mlockall(MLC_FUTURE), then it should now accept the call at all. In our case, the kernel did accept the call to mlockall(MLC_FUTURE), but later ignores this contract. This bug should be fixed. Jörg -- EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin js@cs.tu-berlin.de (uni) schilling@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-23 19:38 ` Joerg Schilling @ 2006-01-23 20:30 ` Matthias Andree 2006-01-23 21:23 ` Joerg Schilling 2006-01-24 8:52 ` Arjan van de Ven 2006-01-23 20:30 ` Lee Revell 1 sibling, 2 replies; 34+ messages in thread From: Matthias Andree @ 2006-01-23 20:30 UTC (permalink / raw) To: Joerg Schilling; +Cc: matthias.andree, arjan, linux-kernel Joerg Schilling schrieb am 2006-01-23: > Matthias Andree <matthias.andree@gmx.de> wrote: > > > On Mon, 23 Jan 2006, Arjan van de Ven wrote: > > > > > hmm... curious that mlockall() succeeds with only a 32kb rlimit.... > > > > It's quite obvious with the seteuid() shuffling behind the scenes of the > > app, for the mlockall() runs with euid==0, and the later mmap() with euid!=0. > > > > Clearly the application should do both with the same privilege or raise > > the RLIMIT_MEMLOCK while running with privileges. > > > > The question that's open is one for the libc guys: malloc(), valloc() > > and others seem to use mmap() on some occasions (for some allocation > > sizes) - at least malloc/malloc.c comments as of 2.3.4 suggest so -, and > > if this isn't orthogonal to mlockall() and set[e]uid() calls, the glibc > > is pretty deeply in trouble if the code calls mlockall(MLC_FUTURE) and > > then drops privileges. > > If the behavior described by Matthias is true for current Linuc kernels, > then there is a clean bug that needs fixing. Jörg elided my lines that said valloc() was the function in question. Jörg, if we're talking about valloc(), this hasn't much to do with the kernel, but is a library issue. There is _no_ documentation that says valloc() or memalign() or posix_memalign() is required to use mmap(). It works on some systems and for some allocation sizes as a side effect of the valloc() implementation. And because this requirement is not specified in the relevant standards, it is wrong to assume valloc() returns locked pages. You cannot rely on mmap() returning locked pages after mlockall() either, because you might be exceeding resource limits. > If the Linux kernel is not willing to accept the contract by > mlockall(MLC_FUTURE), then it should now accept the call at all. If the application wants locked pages, it either needs to call mmap() explicitly, or use mlock() on the valloc()ed region. Even then, allocation or mlock may fail due to resource constraints. I checked FreeBSD 6-STABLE i386, Solaris 8 FCS SPARC and SUSE Linux 10.0 i386 on this. > In our case, the kernel did accept the call to mlockall(MLC_FUTURE), but later > ignores this contract. This bug should be fixed. The complete story is, condensed, and with return values, for a setuid-root application: geteuid() == 0; mlockall(MLC_CURRENT|MLC_FUTURE) == (success); seteuid(500) == (success); valloc(64512 + pagesize) == NULL (failure); Jörg, correct me if the valloc() figure is wrong. valloc() called mmap() internally, tried to grab 1 MB, and failed with EAGAIN - as we were able to see from the strace. SuSE Linux 10.0, kernel 2.6.13-15.7-default #1 Tue Nov 29 14:32:29 UTC 2005 on i686 athlon i386 GNU/Linux -- Matthias Andree ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-23 20:30 ` Matthias Andree @ 2006-01-23 21:23 ` Joerg Schilling 2006-01-23 22:05 ` Matthias Andree 2006-01-24 8:52 ` Arjan van de Ven 1 sibling, 1 reply; 34+ messages in thread From: Joerg Schilling @ 2006-01-23 21:23 UTC (permalink / raw) To: schilling, matthias.andree; +Cc: matthias.andree, linux-kernel, arjan Matthias Andree <matthias.andree@gmx.de> wrote: > > If the behavior described by Matthias is true for current Linuc kernels, > > then there is a clean bug that needs fixing. > > Jörg elided my lines that said valloc() was the function in question. > > Jörg, if we're talking about valloc(), this hasn't much to do with the > kernel, but is a library issue. >From my understanding, the problem is that Linux first grants the mlockall(MLC_FUTURE) call and later ignores this contract. The fact that valloc() works in a way that is not comprehensible seems to be another issue. Libscg calls valloc(size) where size is less than 64 KB. From the strace output from Matthias, it looks like valloc first calls brk() to extend the size of the data segment (probably to aproach the next pagesize aligned border) and later calls mmap() to get 1 MB or memory. Well first it seems that valloc() tries to get too much memory but this is another story. Inside the kernel handler for this call, the permission to lock the new memory _again_ checks for permission and this is wrong as the request for locking all future pages of the process already has been granted. This looks similar to when I open() a file that may only be opened as root and late switch my uid to some other id. If read() would be implemented the same way as Linux implements the locking, each read() call would again check whether the current uid would have permission to get access to the fd from a filename. This is obviously wrong. The _process_ has been granted the rights to mlock all future pages and this is something that needs to be nonored until the process dies. > There is _no_ documentation that says valloc() or memalign() or > posix_memalign() is required to use mmap(). It works on some systems and > for some allocation sizes as a side effect of the valloc() > implementation. The problem seems to be independend how valloc() is implemented. > And because this requirement is not specified in the relevant standards, > it is wrong to assume valloc() returns locked pages. You cannot rely on > mmap() returning locked pages after mlockall() either, because you might > be exceeding resource limits. If there were such resource limits, then they would need to be honored regardless of the privileges of the process. > > If the Linux kernel is not willing to accept the contract by > > mlockall(MLC_FUTURE), then it should now accept the call at all. > > If the application wants locked pages, it either needs to call mmap() > explicitly, or use mlock() on the valloc()ed region. Even then, > allocation or mlock may fail due to resource constraints. I checked > FreeBSD 6-STABLE i386, Solaris 8 FCS SPARC and SUSE Linux 10.0 i386 on > this. What did you check? Solaris does not check for any privileges whan calling mmap() Solaris implements mlockall() via memcntl which contains the only place where a check for secpolicy_lock_memory(CRED()) takes place. > > In our case, the kernel did accept the call to mlockall(MLC_FUTURE), but later > > ignores this contract. This bug should be fixed. > > The complete story is, condensed, and with return values, for a > setuid-root application: > > geteuid() == 0; > mlockall(MLC_CURRENT|MLC_FUTURE) == (success); > seteuid(500) == (success); > valloc(64512 + pagesize) == NULL (failure); > > Jörg, correct me if the valloc() figure is wrong. > > valloc() called mmap() internally, tried to grab 1 MB, and failed with > EAGAIN - as we were able to see from the strace. This is correct. Returning EAGAIN seems to be a result of missunderstanding the POSIX standard. The POSIX standard means real hardware resources when talking about EAGAIN] [ML] The mapping could not be locked in memory, if required by mlockall(), due to a lack of resources. If linux likes to ass a new RLIMIT_MEMLOCK resource, it would be needed to honor this resource independent from the user id in order to prevent being contradictory. Jörg -- EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin js@cs.tu-berlin.de (uni) schilling@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-23 21:23 ` Joerg Schilling @ 2006-01-23 22:05 ` Matthias Andree 0 siblings, 0 replies; 34+ messages in thread From: Matthias Andree @ 2006-01-23 22:05 UTC (permalink / raw) To: Joerg Schilling; +Cc: matthias.andree, linux-kernel, arjan Joerg Schilling schrieb am 2006-01-23: > Matthias Andree <matthias.andree@gmx.de> wrote: > > > > If the behavior described by Matthias is true for current Linuc kernels, > > > then there is a clean bug that needs fixing. > > > > Jörg elided my lines that said valloc() was the function in question. > > > > Jörg, if we're talking about valloc(), this hasn't much to do with the > > kernel, but is a library issue. > > From my understanding, the problem is that Linux first grants the > mlockall(MLC_FUTURE) call and later ignores this contract. ... > Inside the kernel handler for this call, the permission to lock the new > memory _again_ checks for permission and this is wrong as the request > for locking all future pages of the process already has been granted. I *do* think that the kernel refused our mmap() request on grounds of the RLIMIT_MEMLOCK (32 kB) and not any other reason, because running the same allocation code as root succeeds, and Linux 2.6.13 is documented to ignore RLIMIT_MEMLOCK for the super-user. And I do believe Linux is entirely on IEEE Std 1003.1-2001 grounds here. > > There is _no_ documentation that says valloc() or memalign() or > > posix_memalign() is required to use mmap(). It works on some systems and > > for some allocation sizes as a side effect of the valloc() > > implementation. > > The problem seems to be independend how valloc() is implemented. As far as the kernel is concerned, yes. As far as your application is concerned, valloc() does not provide "mapped" or "locked" pages, but "allocated". > > And because this requirement is not specified in the relevant standards, > > it is wrong to assume valloc() returns locked pages. You cannot rely on > > mmap() returning locked pages after mlockall() either, because you might > > be exceeding resource limits. > > If there were such resource limits, then they would need to be honored > regardless of the privileges of the process. That's a different story. > > > If the Linux kernel is not willing to accept the contract by > > > mlockall(MLC_FUTURE), then it should now accept the call at all. > > > > If the application wants locked pages, it either needs to call mmap() > > explicitly, or use mlock() on the valloc()ed region. Even then, > > allocation or mlock may fail due to resource constraints. I checked > > FreeBSD 6-STABLE i386, Solaris 8 FCS SPARC and SUSE Linux 10.0 i386 on > > this. > > What did you check? The mlockall() documentation. Any OS allows later mappings to fail if they cannot be locked, and this is what happens. The only troublesome spot that remains is valloc() using mmap() internally, which inherits the mlockall()/mmap() failure modes and causes bogus "out of memory" returns by valloc(). 1. valloc is not required to lock pages 2. yet it can fail if it cannot lock pages This is a problem from the applications POV, albeit one that is in glibc's memory allocator. mlockall() does NOT make promises HOW MUCH memory may be allocated in the future, and that is the problem at hand. Linux allows us 32 kB (as unprivileged user even, we don't get that with Solaris or FreeBSD!), but we want 63 kB and Linux says "Sorry, you can't have that. EAGAIN" > Returning EAGAIN seems to be a result of missunderstanding the POSIX > standard. The POSIX standard means real hardware resources when talking about Well... mlockall() allows for, "other implementation-defined limit[s]", so POSIX is not supportive of your argument here. > EAGAIN] > [ML] The mapping could not be locked in memory, if required by > mlockall(), due to a lack of resources. > > If linux likes to ass a new RLIMIT_MEMLOCK resource, it would be needed to > honor this resource independent from the user id in order to prevent being > contradictory. This is irrelevant to cdrecord, because it does not trip over this contradiction. If I were the cdrecord maintainer, I'd forget about mlockall() altogether because it's just too broad and doesn't allow something like "no more auto locking" without unlocking all locked pages (see also Lee Revell's earlier post), lock the FIFO, command data buffers and everything explicitly through mlock(), set the scheduler, open the device and then call setuid() to get rid of the saved set-user-id as well. This may be narrow-minded, but given mlock() is present in the BSD world (FreeBSD, NetBSD), in the SysV world (Solaris) and Linux, there's reason to support it, as these constitute a large user base. If anything then still fails (command filter), I'd ask the kernel guys how the restriction can be lifted so that cdrecord can work without ANY root privileges, in the most portable way. -- Matthias Andree ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-23 20:30 ` Matthias Andree 2006-01-23 21:23 ` Joerg Schilling @ 2006-01-24 8:52 ` Arjan van de Ven 2006-01-24 9:08 ` Joerg Schilling 1 sibling, 1 reply; 34+ messages in thread From: Arjan van de Ven @ 2006-01-24 8:52 UTC (permalink / raw) To: Matthias Andree; +Cc: Joerg Schilling, linux-kernel > c() was the function in question. > > Jörg, if we're talking about valloc(), this hasn't much to do with the > kernel, but is a library issue. > > There is _no_ documentation that says valloc() or memalign() or > posix_memalign() is required to use mmap(). It works on some systems and > for some allocation sizes as a side effect of the valloc() > implementation. it doesn't matter. Regardless of the method, the memory has to be locked due to the FUTURE requirement. > And because this requirement is not specified in the relevant standards, > it is wrong to assume valloc() returns locked pages. is it? I sort of doubt that (but I'm not a standards expert, but I'd expect that "lock all in the future" applies to all memory, not just mmap'd memory > You cannot rely on > mmap() returning locked pages after mlockall() either, because you might > be exceeding resource limits. this is true and fully correct the situation is messy; I can see some value in the hack Ted proposed to just bump the rlimit automatically at an mlockall-done-by-root.. but to be fair it's a hack :( ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-24 8:52 ` Arjan van de Ven @ 2006-01-24 9:08 ` Joerg Schilling 2006-01-24 9:15 ` Arjan van de Ven 2006-01-24 10:51 ` Matthias Andree 0 siblings, 2 replies; 34+ messages in thread From: Joerg Schilling @ 2006-01-24 9:08 UTC (permalink / raw) To: matthias.andree, arjan; +Cc: schilling, linux-kernel Arjan van de Ven <arjan@infradead.org> wrote: > > And because this requirement is not specified in the relevant standards, > > it is wrong to assume valloc() returns locked pages. > > is it? I sort of doubt that (but I'm not a standards expert, but I'd > expect that "lock all in the future" applies to all memory, not just > mmap'd memory I concur: Locking pages into core is a property/duty of the VM subsystem. If you have an orthogonal VM subsystem, you cannot later tell how a page was mapped into the user's address space. Even more: you may map a file to a alocation in the data segment of the proces (that has been retrieved via malloc()/brk()) and replace the related mapping with a mapped file. On Solaris, there is no difference. > > > You cannot rely on > > mmap() returning locked pages after mlockall() either, because you might > > be exceeding resource limits. > > this is true and fully correct > > > > the situation is messy; I can see some value in the hack Ted proposed to > just bump the rlimit automatically at an mlockall-done-by-root.. but to > be fair it's a hack :( As all other rlimits are honored even if you are root, it looks not orthogonal to disregard an existing RLIMIT_MEMLOCK rlimit if you are root. Jörg -- EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin js@cs.tu-berlin.de (uni) schilling@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-24 9:08 ` Joerg Schilling @ 2006-01-24 9:15 ` Arjan van de Ven 2006-01-24 9:18 ` Joerg Schilling 2006-01-24 21:28 ` Theodore Ts'o 2006-01-24 10:51 ` Matthias Andree 1 sibling, 2 replies; 34+ messages in thread From: Arjan van de Ven @ 2006-01-24 9:15 UTC (permalink / raw) To: Joerg Schilling; +Cc: matthias.andree, linux-kernel On Tue, 2006-01-24 at 10:08 +0100, Joerg Schilling wrote: > > the situation is messy; I can see some value in the hack Ted proposed to > > just bump the rlimit automatically at an mlockall-done-by-root.. but to > > be fair it's a hack :( > > As all other rlimits are honored even if you are root, it looks not orthogonal > to disregard an existing RLIMIT_MEMLOCK rlimit if you are root. that's another solution; give root a higher rlimit by default for this. It's also a bit messy, but a not-unreasonable default behavior. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-24 9:15 ` Arjan van de Ven @ 2006-01-24 9:18 ` Joerg Schilling 2006-01-24 21:28 ` Theodore Ts'o 1 sibling, 0 replies; 34+ messages in thread From: Joerg Schilling @ 2006-01-24 9:18 UTC (permalink / raw) To: schilling, arjan; +Cc: matthias.andree, linux-kernel Arjan van de Ven <arjan@infradead.org> wrote: > On Tue, 2006-01-24 at 10:08 +0100, Joerg Schilling wrote: > > > the situation is messy; I can see some value in the hack Ted proposed to > > > just bump the rlimit automatically at an mlockall-done-by-root.. but to > > > be fair it's a hack :( > > > > As all other rlimits are honored even if you are root, it looks not orthogonal > > to disregard an existing RLIMIT_MEMLOCK rlimit if you are root. > > that's another solution; give root a higher rlimit by default for this. > It's also a bit messy, but a not-unreasonable default behavior. This would only make sense in case that you bump up the limit for processes that are suid root and do not lower it in case someone calls seteuid(). Jörg -- EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin js@cs.tu-berlin.de (uni) schilling@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-24 9:15 ` Arjan van de Ven 2006-01-24 9:18 ` Joerg Schilling @ 2006-01-24 21:28 ` Theodore Ts'o 2006-01-24 23:19 ` Edgar Toernig ` (2 more replies) 1 sibling, 3 replies; 34+ messages in thread From: Theodore Ts'o @ 2006-01-24 21:28 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Joerg Schilling, matthias.andree, linux-kernel On Tue, Jan 24, 2006 at 10:15:40AM +0100, Arjan van de Ven wrote: > On Tue, 2006-01-24 at 10:08 +0100, Joerg Schilling wrote: > > > the situation is messy; I can see some value in the hack Ted proposed to > > > just bump the rlimit automatically at an mlockall-done-by-root.. but to > > > be fair it's a hack :( > > > > As all other rlimits are honored even if you are root, it looks not orthogonal > > to disregard an existing RLIMIT_MEMLOCK rlimit if you are root. > > that's another solution; give root a higher rlimit by default for this. > It's also a bit messy, but a not-unreasonable default behavior. I thought in the case we were talking about, the problem is that we have a setuid program which calls mlockall() but then later drops its privileges. So when it tries to allocate memories, RLIMIT_MEMLOCK applies again, and so all future memory allocations would fail. What I proposed is a hack, but strictly speaking not necessary according to the POSIX standards, but the problem is that a portable program can't be expected to know that Linux has a RLIMIT_MEMLOCK resource limit, such that a program which calls mlockall() and then drops privileges will work under Solaris and fail under Linux. Hence I why proposed a hack where mlockall() would adjust RLIMIT_MEMLOCK. Yes, no question it's a hack and a special case; the question is whether cure or the disease is worse. - Ted ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-24 21:28 ` Theodore Ts'o @ 2006-01-24 23:19 ` Edgar Toernig 2006-01-25 15:38 ` Joerg Schilling 2006-01-24 23:26 ` Matthias Andree 2006-01-25 15:33 ` Joerg Schilling 2 siblings, 1 reply; 34+ messages in thread From: Edgar Toernig @ 2006-01-24 23:19 UTC (permalink / raw) To: Theodore Ts'o Cc: Arjan van de Ven, Joerg Schilling, matthias.andree, linux-kernel Theodore Ts'o wrote: > > ... proposed a hack where mlockall() would adjust RLIMIT_MEMLOCK. > Yes, no question it's a hack and a special case; the question is > whether cure or the disease is worse. What about exec? The memory locks are removed on exec but with that hack the raised limit would stay. Looks like a security bug. Ciao, ET. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-24 23:19 ` Edgar Toernig @ 2006-01-25 15:38 ` Joerg Schilling 0 siblings, 0 replies; 34+ messages in thread From: Joerg Schilling @ 2006-01-25 15:38 UTC (permalink / raw) To: tytso, froese; +Cc: schilling, matthias.andree, linux-kernel, arjan Edgar Toernig <froese@gmx.de> wrote: > Theodore Ts'o wrote: > > > > ... proposed a hack where mlockall() would adjust RLIMIT_MEMLOCK. > > Yes, no question it's a hack and a special case; the question is > > whether cure or the disease is worse. > > What about exec? The memory locks are removed on exec but with that > hack the raised limit would stay. Looks like a security bug. The RLIMIT_MEMLOCK feature itself may be a security bug implemented the way it currentlyy is. For me it would make sense to be able to lock everything in core and then be able to tell the system that at most 1MB of additional memory may be locked. In this case, there should be no general failure but the possibility to verify that the value is sufficient for usual cases. Jörg -- EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin js@cs.tu-berlin.de (uni) schilling@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-24 21:28 ` Theodore Ts'o 2006-01-24 23:19 ` Edgar Toernig @ 2006-01-24 23:26 ` Matthias Andree 2006-01-24 23:27 ` Matthias Andree 2006-01-25 15:33 ` Joerg Schilling 2 siblings, 1 reply; 34+ messages in thread From: Matthias Andree @ 2006-01-24 23:26 UTC (permalink / raw) To: Theodore Ts'o, Arjan van de Ven, Joerg Schilling, matthias.andree, linux-kernel Theodore Ts'o schrieb am 2006-01-24: > I thought in the case we were talking about, the problem is that we > have a setuid program which calls mlockall() but then later drops its > privileges. So when it tries to allocate memories, RLIMIT_MEMLOCK > applies again, and so all future memory allocations would fail. That's the coarse view. In fact, the application does not call setuid() at this time, but only seteuid(), so it can regain privileges later, and will in fact do that. The application in question does this: (root here) 1 mlockall() 2 seteuid(500); /* park privileges for a moment */ 3 valloc(63 kB); /* fails since 2.6.9's tight MEMLOCK limit */ The first patch I suggested for the application exchanged steps #2 and #3 and works, but is not acceptable to Jörg. We haven't talked about the reasons. The idea behind my patch was this: if it wants the memory locked (which is a privileged operation on many systems anyways), then why not allocate as root? Would this hurt portability to any other system? I don't think so. Is such a rationale unreasonable in itself? Not either. Further patch suggestions negotiated forth and back on raising the limit and to what value. The other problem is that glibc 2.3.5 is part of the story, but off-topic here, because glibc is the link between valloc() (application side) and the mmap() (kernel side). > What I proposed is a hack, [and] strictly speaking not necessary > according to the POSIX standards, but the problem is that a portable > program can't be expected to know that Linux has a RLIMIT_MEMLOCK > resource limit, such that a program which calls mlockall() and then > drops privileges will work under Solaris and fail under Linux. Hence > I why proposed a hack where mlockall() would adjust RLIMIT_MEMLOCK. > Yes, no question it's a hack and a special case; the question is > whether cure or the disease is worse. Is the KERNEL the right place to implement policy such as setting locked-page limits to 32 kB? What if the limit were RLIM_INFINITY for root processes instead of hacking mlockall() and the resource checks? -- Matthias Andree ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-24 23:26 ` Matthias Andree @ 2006-01-24 23:27 ` Matthias Andree 0 siblings, 0 replies; 34+ messages in thread From: Matthias Andree @ 2006-01-24 23:27 UTC (permalink / raw) To: Theodore Ts'o, Arjan van de Ven, Joerg Schilling, linux-kernel Matthias Andree schrieb am 2006-01-25: > What if the limit were RLIM_INFINITY for root processes instead of > hacking mlockall() and the resource checks? OK, reading Edgar's hint, the answer is "It's a bad idea." -- Matthias Andree ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-24 21:28 ` Theodore Ts'o 2006-01-24 23:19 ` Edgar Toernig 2006-01-24 23:26 ` Matthias Andree @ 2006-01-25 15:33 ` Joerg Schilling 2006-01-25 16:01 ` Matthias Andree 2 siblings, 1 reply; 34+ messages in thread From: Joerg Schilling @ 2006-01-25 15:33 UTC (permalink / raw) To: tytso, arjan; +Cc: schilling, matthias.andree, linux-kernel "Theodore Ts'o" <tytso@mit.edu> wrote: > I thought in the case we were talking about, the problem is that we > have a setuid program which calls mlockall() but then later drops its > privileges. So when it tries to allocate memories, RLIMIT_MEMLOCK > applies again, and so all future memory allocations would fail. > > What I proposed is a hack, but strictly speaking not necessary > according to the POSIX standards, but the problem is that a portable > program can't be expected to know that Linux has a RLIMIT_MEMLOCK > resource limit, such that a program which calls mlockall() and then > drops privileges will work under Solaris and fail under Linux. Hence > I why proposed a hack where mlockall() would adjust RLIMIT_MEMLOCK. > Yes, no question it's a hack and a special case; the question is > whether cure or the disease is worse. Maybe, I should give some hints... RLIMIT_MEMLOCK did first apear in BSD-4.4 around 1994. The iplementation is incomplete since then and partially disabled (size check for mmap() in the kernel) on FreeBSD as it has been 1994 on BSD-4.4 FreeBSD currently uses a default value of RLIMIT_INFINITY for users. I could add this piece of code to the euid == 0 part of cdrecord: LOCAL void raise_memlock() { #ifdef RLIMIT_MEMLOCK struct rlimit rlim; rlim.rlim_cur = rlim.rlim_max = RLIM_INFINITY; if (setrlimit(RLIMIT_MEMLOCK, &rlim) < 0) errmsg("Warning: Cannot raise RLIMIT_MEMLOCK limits."); #endif /* RLIMIT_NOFILE */ } Jörg -- EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin js@cs.tu-berlin.de (uni) schilling@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-25 15:33 ` Joerg Schilling @ 2006-01-25 16:01 ` Matthias Andree 0 siblings, 0 replies; 34+ messages in thread From: Matthias Andree @ 2006-01-25 16:01 UTC (permalink / raw) To: Joerg Schilling; +Cc: tytso, arjan, linux-kernel Joerg Schilling wrote: > RLIMIT_MEMLOCK did first apear in BSD-4.4 around 1994. > The iplementation is incomplete since then and partially disabled (size check > for mmap() in the kernel) on FreeBSD as it has been 1994 on BSD-4.4 > > FreeBSD currently uses a default value of RLIMIT_INFINITY for users. And while it does that (or in fact, rather not distinguish between root and unprivileged users), mlock() and mlockall() are privileged operations on FreeBSD. > I could add this piece of code to the euid == 0 part of cdrecord: > > LOCAL void > raise_memlock() > { > #ifdef RLIMIT_MEMLOCK > struct rlimit rlim; > > rlim.rlim_cur = rlim.rlim_max = RLIM_INFINITY; > > if (setrlimit(RLIMIT_MEMLOCK, &rlim) < 0) > errmsg("Warning: Cannot raise RLIMIT_MEMLOCK limits."); > #endif /* RLIMIT_NOFILE */ > } Except that your new #endif comment is wrong, that is exactly what I suggested and what I've tried and found working. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-24 9:08 ` Joerg Schilling 2006-01-24 9:15 ` Arjan van de Ven @ 2006-01-24 10:51 ` Matthias Andree 1 sibling, 0 replies; 34+ messages in thread From: Matthias Andree @ 2006-01-24 10:51 UTC (permalink / raw) To: Joerg Schilling; +Cc: matthias.andree, arjan, linux-kernel Joerg Schilling schrieb am 2006-01-24: > Arjan van de Ven <arjan@infradead.org> wrote: > > > > And because this requirement is not specified in the relevant standards, > > > it is wrong to assume valloc() returns locked pages. > > > > is it? I sort of doubt that (but I'm not a standards expert, but I'd > > expect that "lock all in the future" applies to all memory, not just > > mmap'd memory > > I concur: > > Locking pages into core is a property/duty of the VM subsystem. But where is this laid down in the standard? There must be some part that defines this, else we cannot rely on it. The wording for malloc() and mmap() or mlock() is different. One talks about address space and mapping, whereas malloc() talks about "storage". Only I haven't got time to look for it now. Just that Solaris happens to do it doesn't make it a standard. -- Matthias Andree ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-23 19:38 ` Joerg Schilling 2006-01-23 20:30 ` Matthias Andree @ 2006-01-23 20:30 ` Lee Revell 2006-01-23 21:33 ` Joerg Schilling 1 sibling, 1 reply; 34+ messages in thread From: Lee Revell @ 2006-01-23 20:30 UTC (permalink / raw) To: Joerg Schilling; +Cc: matthias.andree, arjan, linux-kernel On Mon, 2006-01-23 at 20:38 +0100, Joerg Schilling wrote: > Matthias Andree <matthias.andree@gmx.de> wrote: > > > On Mon, 23 Jan 2006, Arjan van de Ven wrote: > > > > > hmm... curious that mlockall() succeeds with only a 32kb rlimit.... > > > > It's quite obvious with the seteuid() shuffling behind the scenes of the > > app, for the mlockall() runs with euid==0, and the later mmap() with euid!=0. > > > > Clearly the application should do both with the same privilege or raise > > the RLIMIT_MEMLOCK while running with privileges. > > > > The question that's open is one for the libc guys: malloc(), valloc() > > and others seem to use mmap() on some occasions (for some allocation > > sizes) - at least malloc/malloc.c comments as of 2.3.4 suggest so -, and > > if this isn't orthogonal to mlockall() and set[e]uid() calls, the glibc > > is pretty deeply in trouble if the code calls mlockall(MLC_FUTURE) and > > then drops privileges. > > If the behavior described by Matthias is true for current Linuc kernels, > then there is a clean bug that needs fixing. > > If the Linux kernel is not willing to accept the contract by > mlockall(MLC_FUTURE), then it should now accept the call at all. > > In our case, the kernel did accept the call to mlockall(MLC_FUTURE), but later > ignores this contract. This bug should be fixed. Joerg, You will be happy to know that in future Linux distros, cdrecord will not require setuid to mlock() and get SCHED_FIFO - both are now controlled by rlimits, so if the distro ships with a sane PAM/group configuration, all you will need to do is add cdrecord users to the "realtime" or "cdrecord" or "audio" group. This will take a while to make it into distros as it requires changes to PAM and glibc in addition to the kernel. Lee ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-23 20:30 ` Lee Revell @ 2006-01-23 21:33 ` Joerg Schilling 0 siblings, 0 replies; 34+ messages in thread From: Joerg Schilling @ 2006-01-23 21:33 UTC (permalink / raw) To: schilling, rlrevell; +Cc: matthias.andree, linux-kernel, arjan Lee Revell <rlrevell@joe-job.com> wrote: > > In our case, the kernel did accept the call to mlockall(MLC_FUTURE), but later > > ignores this contract. This bug should be fixed. > > Joerg, > > You will be happy to know that in future Linux distros, cdrecord will > not require setuid to mlock() and get SCHED_FIFO - both are now > controlled by rlimits, so if the distro ships with a sane PAM/group > configuration, all you will need to do is add cdrecord users to the > "realtime" or "cdrecord" or "audio" group. > > This will take a while to make it into distros as it requires changes to > PAM and glibc in addition to the kernel. Well, on Solaris running cdrecord root-less is possible since 2 years. What you do is to add a line joerg::::profiles=CD RW to /etc/user_attr and a line: CD RW:solaris:cmd:::/opt/schily/bin/cdrecord: privs=file_dac_read,sys_devices,proc_lock_memory,proc_priocntl,net_privaddr to /etc/security/exec_attr or to just a line All:solaris:cmd:::/opt/schily/bin/cdrecord: privs=file_dac_read,sys_devices,proc_lock_memory,proc_priocntl,net_privaddr to /etc/security/exec_attr the command then is executed via /usr/vin/pfexec and gets the listed fine grained privileges in addition to the basic privileges. We plan to break sys_devices into more fine grained privs that include several levels of SCSI rights in the near future. If Linux manages to do something similar, I would be happy. It is obvious that this is someting that could only be used if there is not only kernel code to support fine grained privs but there is a need for a user space infrastructure that allows to use a seamless integration. Jörg -- EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin js@cs.tu-berlin.de (uni) schilling@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-23 18:55 ` Matthias Andree 2006-01-23 19:04 ` Arjan van de Ven 2006-01-23 19:38 ` Joerg Schilling @ 2006-01-23 19:57 ` Lee Revell 2006-01-23 21:34 ` Theodore Ts'o 3 siblings, 0 replies; 34+ messages in thread From: Lee Revell @ 2006-01-23 19:57 UTC (permalink / raw) To: Matthias Andree; +Cc: Arjan van de Ven, Linux-Kernel mailing list On Mon, 2006-01-23 at 19:55 +0100, Matthias Andree wrote: > I'm asking the Bcc'd gentleman to reconsider mlockall() and perhaps > use explicit mlock() instead. Probably good advice, I have found mlockall() to be especially problematic with multithreaded programs and NPTL, as glibc eats RLIMIT_STACK of unswappable memory for each thread stack which defaults to 8MB here - you go OOM really quick like this. Most people don't seem to realize the need to set a sane value with pthread_attr_setstack(). (Even when not mlock'ed, insanely huge thread stack defaults seem to account for a lot of the visible bloat on the desktop - decreasing RLIMIT_STACK to 512KB reduces the footprint of Gnome 2.12 by 100+ MB.) Lee ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-23 18:55 ` Matthias Andree ` (2 preceding siblings ...) 2006-01-23 19:57 ` Lee Revell @ 2006-01-23 21:34 ` Theodore Ts'o 2006-01-24 11:06 ` Matthias Andree 3 siblings, 1 reply; 34+ messages in thread From: Theodore Ts'o @ 2006-01-23 21:34 UTC (permalink / raw) To: Arjan van de Ven, Linux-Kernel mailing list On Mon, Jan 23, 2006 at 07:55:49PM +0100, Matthias Andree wrote: > The question that's open is one for the libc guys: malloc(), valloc() > and others seem to use mmap() on some occasions (for some allocation > sizes) - at least malloc/malloc.c comments as of 2.3.4 suggest so -, and > if this isn't orthogonal to mlockall() and set[e]uid() calls, the glibc > is pretty deeply in trouble if the code calls mlockall(MLC_FUTURE) and > then drops privileges. Maybe mlockall(MLC_FUTURE) when run with privileges should automatically adjust the RLIMIT_MEMLOCK resource limit? - Ted ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Rationale for RLIMIT_MEMLOCK? 2006-01-23 21:34 ` Theodore Ts'o @ 2006-01-24 11:06 ` Matthias Andree 0 siblings, 0 replies; 34+ messages in thread From: Matthias Andree @ 2006-01-24 11:06 UTC (permalink / raw) To: Theodore Ts'o, Arjan van de Ven, Linux-Kernel mailing list On Mon, 23 Jan 2006, Theodore Ts'o wrote: > On Mon, Jan 23, 2006 at 07:55:49PM +0100, Matthias Andree wrote: > > The question that's open is one for the libc guys: malloc(), valloc() > > and others seem to use mmap() on some occasions (for some allocation > > sizes) - at least malloc/malloc.c comments as of 2.3.4 suggest so -, and > > if this isn't orthogonal to mlockall() and set[e]uid() calls, the glibc > > is pretty deeply in trouble if the code calls mlockall(MLC_FUTURE) and > > then drops privileges. > > Maybe mlockall(MLC_FUTURE) when run with privileges should > automatically adjust the RLIMIT_MEMLOCK resource limit? Adding special cases to no end. Is this really sensible? How about leaving RLIMIT_MEMLOCK alone (and at RLIM_INFINITY) for root processes altogether? At least that wouldn't add a new special case but just change the existing one to remove an inconsistency, and the effect will be the same, only that it is inherited across seteuid(). I doubt that the kernel is the right place to implement policies that belong into user space. As long as the kernel is meant to be universal, any default will collide with an application's requirement sooner or later. -- Matthias Andree ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2006-02-03 20:49 UTC | newest] Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <5y7B5-1dw-15@gated-at.bofh.it> [not found] ` <5y7KL-1DZ-31@gated-at.bofh.it> [not found] ` <5yddh-1pA-47@gated-at.bofh.it> [not found] ` <5ydni-1Qq-3@gated-at.bofh.it> [not found] ` <5yek1-3iP-53@gated-at.bofh.it> [not found] ` <5yeth-3us-33@gated-at.bofh.it> [not found] ` <5yf5O-4iF-19@gated-at.bofh.it> [not found] ` <5yfI4-5kU-11@gated-at.bofh.it> [not found] ` <5ygE4-6LK-35@gated-at.bofh.it> [not found] ` <5yhqg-7ZR-5@gated-at.bofh.it> [not found] ` <5yi2X-zm-7@gated-at.bofh.it> 2006-01-24 9:14 ` CD writing in future Linux (stirring up a hornets' nest) (was: Rationale for RLIMIT_MEMLOCK?) Bodo Eggert 2006-01-24 14:38 ` Joerg Schilling 2006-01-24 17:44 ` CD writing in future Linux (stirring up a hornets' nest) Bodo Eggert [not found] ` <5ygDT-6LK-3@gated-at.bofh.it> [not found] ` <5yscc-68j-5@gated-at.bofh.it> [not found] ` <5ysvk-6JI-5@gated-at.bofh.it> [not found] ` <5ysvk-6JI-3@gated-at.bofh.it> [not found] ` <5yEn7-7Or-21@gated-at.bofh.it> [not found] ` <5yUUI-6JR-15@gated-at.bofh.it> 2006-01-26 0:12 ` Rationale for RLIMIT_MEMLOCK? Bodo Eggert 2006-02-03 20:49 Michael Kerrisk -- strict thread matches above, loose matches on Subject: below -- 2006-01-23 10:56 Matthias Andree 2006-01-23 11:05 ` Arjan van de Ven 2006-01-23 16:54 ` Matthias Andree 2006-01-23 17:00 ` Arjan van de Ven 2006-01-23 18:01 ` Matthias Andree 2006-01-23 18:13 ` Arjan van de Ven 2006-01-23 18:55 ` Matthias Andree 2006-01-23 19:04 ` Arjan van de Ven 2006-01-23 19:38 ` Joerg Schilling 2006-01-23 20:30 ` Matthias Andree 2006-01-23 21:23 ` Joerg Schilling 2006-01-23 22:05 ` Matthias Andree 2006-01-24 8:52 ` Arjan van de Ven 2006-01-24 9:08 ` Joerg Schilling 2006-01-24 9:15 ` Arjan van de Ven 2006-01-24 9:18 ` Joerg Schilling 2006-01-24 21:28 ` Theodore Ts'o 2006-01-24 23:19 ` Edgar Toernig 2006-01-25 15:38 ` Joerg Schilling 2006-01-24 23:26 ` Matthias Andree 2006-01-24 23:27 ` Matthias Andree 2006-01-25 15:33 ` Joerg Schilling 2006-01-25 16:01 ` Matthias Andree 2006-01-24 10:51 ` Matthias Andree 2006-01-23 20:30 ` Lee Revell 2006-01-23 21:33 ` Joerg Schilling 2006-01-23 19:57 ` Lee Revell 2006-01-23 21:34 ` Theodore Ts'o 2006-01-24 11:06 ` Matthias Andree
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).