Re: CD writing in future Linux (stirring up a hornets' nest) (was: Rationale for RLIMIT

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: CD writing in future Linux (stirring up a hornets' nest) (was:   Rationale for RLIMIT_MEMLOCK?)
       [not found]                   ` <5yi2X-zm-7@gated-at.bofh.it>
@ 2006-01-24  9:14                     ` Bodo Eggert
  2006-01-24 14:38                       ` Joerg Schilling
  0 siblings, 1 reply; 34+ messages in thread
From: Bodo Eggert @ 2006-01-24  9:14 UTC (permalink / raw)
  To: Joerg Schilling, rlrevell, matthias.andree, schilling, linux-kernel

Joerg Schilling <schilling@fokus.fraunhofer.de> wrote:

[...]
> On Solaris, you (currently) use a profile enabled shell (pfsh, pfksh or pfcsh)
> that calls getexecuser() in order to find whether there is a specific
> treatment needed. If this specific treatment is needed, then the shell calls
> execve(/usr/bin/pfexec cmd <args>)
> else it calls  execve(cmd <args>)
> 
> I did recently voted to require all shells to be profile enabled by default.

Why? I asume there will only be few programs requiring to be run by a
wrapper, and mv /usr/bin/foo to /usr/pfexec-bin/foo;
echo $'#!/bin/sh\n/usr/sbin/pfexec /usr/pfexec-bin/foo "$@"' > /usr/bin/foo;
chmod 755 /usr/bin/foo
should be easier than patching e.g. all callers of cdrecord, and it won't
slow down starting non-profiled applications.

Possibly the pfexec can tell the application to be run by the basename (like
su1), in this case you'd add something like
"alias cdrecord /opt/schily/bin/cdrecord" to it's configuration and link it
to /usr/bin/cdrecord.

> With the future plans for extending fine grained privs on Solaris, sending
> SCSI commands will become more than one priv.
> 
> I proposed to have a low priv right to send commands like inquiry and test
> unit ready. These commands may e.g. be send without interfering a concurrent
> CD/DVD write operation.
> 
> The next priv could be the permission for sending simple SCSI commands that
> allow reading from the device.
> 
> The next priv could be the permission for sending simple SCSI Commands that
> allow writing.
> 
> The final priv would allow even vendor specific commands: this is what
> cdrecord needs.

That sounds reasonable, but I wonder how you can get access to a device
file descriptor in order to do unprivileged access.
-- 
Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF
verbreiteten Lügen zu sabotieren.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: CD writing in future Linux (stirring up a hornets' nest) (was:    Rationale for RLIMIT_MEMLOCK?)
  2006-01-24  9:14                     ` CD writing in future Linux (stirring up a hornets' nest) (was: Rationale for RLIMIT_MEMLOCK?) Bodo Eggert
@ 2006-01-24 14:38                       ` Joerg Schilling
  2006-01-24 17:44                         ` CD writing in future Linux (stirring up a hornets' nest) Bodo Eggert
  0 siblings, 1 reply; 34+ messages in thread
From: Joerg Schilling @ 2006-01-24 14:38 UTC (permalink / raw)
  To: schilling, rlrevell, matthias.andree, linux-kernel, 7eggert

Bodo Eggert <harvested.in.lkml@7eggert.dyndns.org> wrote:

> Joerg Schilling <schilling@fokus.fraunhofer.de> wrote:
>
> [...]
> > On Solaris, you (currently) use a profile enabled shell (pfsh, pfksh or pfcsh)
> > that calls getexecuser() in order to find whether there is a specific
> > treatment needed. If this specific treatment is needed, then the shell calls
> > execve(/usr/bin/pfexec cmd <args>)
> > else it calls  execve(cmd <args>)
> > 
> > I did recently voted to require all shells to be profile enabled by default.
>
> Why? I asume there will only be few programs requiring to be run by a
> wrapper, and mv /usr/bin/foo to /usr/pfexec-bin/foo;
> echo $'#!/bin/sh\n/usr/sbin/pfexec /usr/pfexec-bin/foo "$@"' > /usr/bin/foo;
> chmod 755 /usr/bin/foo
> should be easier than patching e.g. all callers of cdrecord, and it won't
> slow down starting non-profiled applications.

Because the architecture review commitee decided this would be the right way.

Note that we are on a migration from the classical root/non-root UNIX to a fine 
grained privileges handling. The current documentation says that you need to 
have a profile enabled shell as your SHELL in order to be able to use a 
root-less Solaris.

> Possibly the pfexec can tell the application to be run by the basename (like
> su1), in this case you'd add something like
> "alias cdrecord /opt/schily/bin/cdrecord" to it's configuration and link it
> to /usr/bin/cdrecord.

But you are right that another way would be to use something like "isaexec"

> > The final priv would allow even vendor specific commands: this is what
> > cdrecord needs.
>
> That sounds reasonable, but I wonder how you can get access to a device
> file descriptor in order to do unprivileged access.

This is something that needs to be discussed. Last night, I found that there 
should be a way to run cdrecord without the need to have the "file_dac_read"
provilege. I'll discuss this with the security group.



Jörg

-- 
 EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
       js@cs.tu-berlin.de                (uni)  
       schilling@fokus.fraunhofer.de     (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: CD writing in future Linux (stirring up a hornets' nest)
  2006-01-24 14:38                       ` Joerg Schilling
@ 2006-01-24 17:44                         ` Bodo Eggert
  0 siblings, 0 replies; 34+ messages in thread
From: Bodo Eggert @ 2006-01-24 17:44 UTC (permalink / raw)
  To: Joerg Schilling; +Cc: rlrevell, matthias.andree, linux-kernel, 7eggert

On Tue, 24 Jan 2006, Joerg Schilling wrote:
> Bodo Eggert <harvested.in.lkml@7eggert.dyndns.org> wrote:
> > Joerg Schilling <schilling@fokus.fraunhofer.de> wrote:
> > [...]
> > > On Solaris, you (currently) use a profile enabled shell (pfsh, pfksh or pfcsh)
> > > that calls getexecuser() in order to find whether there is a specific
> > > treatment needed. If this specific treatment is needed, then the shell calls
> > > execve(/usr/bin/pfexec cmd <args>)
> > > else it calls  execve(cmd <args>)
> > > 
> > > I did recently voted to require all shells to be profile enabled by default.
> >
> > Why? I asume there will only be few programs requiring to be run by a
> > wrapper, and mv /usr/bin/foo to /usr/pfexec-bin/foo;
> > echo $'#!/bin/sh\n/usr/sbin/pfexec /usr/pfexec-bin/foo "$@"' > /usr/bin/foo;
> > chmod 755 /usr/bin/foo
> > should be easier than patching e.g. all callers of cdrecord, and it won't
> > slow down starting non-profiled applications.
> 
> Because the architecture review commitee decided this would be the right way.
> 
> Note that we are on a migration from the classical root/non-root UNIX to a fine 
> grained privileges handling. The current documentation says that you need to 
> have a profile enabled shell as your SHELL in order to be able to use a 
> root-less Solaris.

If the shell was the only program calling cdrecord, this would work out as 
expected.
-- 
My mail reader can beat up your mail reader. 

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <5ygDT-6LK-3@gated-at.bofh.it>]

[parent not found: <5yscc-68j-5@gated-at.bofh.it>]

[parent not found: <5ysvk-6JI-5@gated-at.bofh.it>]

[parent not found: <5ysvk-6JI-3@gated-at.bofh.it>]

[parent not found: <5yEn7-7Or-21@gated-at.bofh.it>]

[parent not found: <5yUUI-6JR-15@gated-at.bofh.it>]

* Re: Rationale for RLIMIT_MEMLOCK?
       [not found]                         ` <5yUUI-6JR-15@gated-at.bofh.it>
@ 2006-01-26  0:12                           ` Bodo Eggert
  0 siblings, 0 replies; 34+ messages in thread
From: Bodo Eggert @ 2006-01-26  0:12 UTC (permalink / raw)
  To: Joerg Schilling, schilling, matthias.andree, linux-kernel, tytso, arjan

Joerg Schilling <schilling@fokus.fraunhofer.de> wrote:

> I could add this piece of code to the euid == 0 part of cdrecord:
> 
> LOCAL void
> raise_memlock()
> { 
> #ifdef  RLIMIT_MEMLOCK
>         struct rlimit rlim;
>  
>         rlim.rlim_cur = rlim.rlim_max = RLIM_INFINITY;

I think you should rather use the size you're going to mlock, or at least
the upper bound.
-- 
Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF
verbreiteten Lügen zu sabotieren.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
@ 2006-02-03 20:49 Michael Kerrisk
  0 siblings, 0 replies; 34+ messages in thread
From: Michael Kerrisk @ 2006-02-03 20:49 UTC (permalink / raw)
  To: matthias.andree
  Cc: Theodore Ts'o, linux-kernel, arjan, Joerg Schilling, michael.kerrisk

> > Matthias Andree <matthias.andree@gmx.de> wrote:

[...]

> The complete story is, condensed, and with return values, for a
> setuid-root application:
> 
>   geteuid() == 0;
>   mlockall(MLC_CURRENT|MLC_FUTURE) == (success);
>   seteuid(500) == (success);
>   valloc(64512 + pagesize) == NULL (failure);

[...]

A late follow-up to this thread. I've added the following text
to the mlockall() manual pag under BUGS:

    Since kernel 2.6.9, if a privileged process calls 
    mlockall(MCL_FUTURE) and later drops privileges
    (CAP_IPC_LOCK), then subsequent memory allocations
    (e.g., mmap(2), sbrk(2)) will fail if the 
    RLIMIT_MEMLOCK resource limit is encountered.

The change will be in man-pages 2.23.

Cheers,

Michael

-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
ftp://ftp.win.tue.nl/pub/linux-local/manpages/, 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Rationale for RLIMIT_MEMLOCK?
@ 2006-01-23 10:56 Matthias Andree
  2006-01-23 11:05 ` Arjan van de Ven
  0 siblings, 1 reply; 34+ messages in thread
From: Matthias Andree @ 2006-01-23 10:56 UTC (permalink / raw)
  To: Linux-Kernel mailing list

Greetings,

debugging an application problem that used to mlockall(...FUTURE) and
failed with a subsequent mmap(), I came across the manual page for
setrlimit (see below for the relevant excerpt). I have several questions
concerning the rationale:

1. What is the reason we're having special treatment
   for the super-user here?

2. Why is it the opposite of what 2.6.8.1 and earlier did?

3. Why is this inconsistent with all other RLIMIT_*?
   Neither of which cares if a process is privileged or not.

4. Is the default hard limit of 32 kB initialized by the kernel or
   by some script in SUSE 10.0? If it's the kernel: why is the limit so
   low, and why isn't just the soft limit set?

   "[...]
    RLIMIT_MEMLOCK
      The maximum number of bytes of memory that may  be  locked  into
      RAM.  In effect this limit is rounded down to the nearest multi-
      ple of the system page size.  This limit  affects  mlock(2)  and
      mlockall(2)  and  the mmap(2) MAP_LOCKED operation.  Since Linux
      2.6.9 it also affects the shmctl(2) SHM_LOCK operation, where it
      sets a maximum on the total bytes in shared memory segments (see
      shmget(2)) that may be locked by the real user ID of the calling
      process.   The  shmctl(2) SHM_LOCK locks are accounted for sepa-
      rately  from  the  per-process  memory  locks   established   by
      mlock(2),  mlockall(2),  and  mmap(2)  MAP_LOCKED; a process can
      lock bytes up to this limit in each of these two categories.  In
      Linux  kernels before 2.6.9, this limit controlled the amount of
      memory that could be locked  by  a  privileged  process.   Since
      Linux 2.6.9, no limits are placed on the amount of memory that a
      privileged process may lock, and this limit instead governs  the
      amount of memory that an unprivileged process may lock. [...]"
   (getrlimit(2), man-pages-2.07)

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-23 10:56 Matthias Andree
@ 2006-01-23 11:05 ` Arjan van de Ven
  2006-01-23 16:54   ` Matthias Andree
  0 siblings, 1 reply; 34+ messages in thread
From: Arjan van de Ven @ 2006-01-23 11:05 UTC (permalink / raw)
  To: Matthias Andree; +Cc: Linux-Kernel mailing list

`
> 
> 1. What is the reason we're having special treatment
>    for the super-user here?

it's quite common to allow root (or more specific, the right capability)
to override rlimits. Many such security check behave that way so it's
only "just" to treat this one like that as well.

> 2. Why is it the opposite of what 2.6.8.1 and earlier did?

the earlier behavior didn't really make sense, and gave cause to
multimedia apps running as root only to be able to mlock etc etc. Now
this can be dynamically controlled instead.

> 4. Is the default hard limit of 32 kB initialized by the kernel or

the kernel has a relatively low default. The reason is simple: allow too
much mlock and the user can DoS the machine too easy. The kernel default
should be safe, the admin / distro can very easily override anyway.

You may ask: why is it not zero?
It is very useful for many things to have a "small" mlock area. gpg, ssh
and basically anything that works with keys and passwords. Small
relative to the other resources such a process takes (eg kernel stacks
etc).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-23 11:05 ` Arjan van de Ven
@ 2006-01-23 16:54   ` Matthias Andree
  2006-01-23 17:00     ` Arjan van de Ven
  0 siblings, 1 reply; 34+ messages in thread
From: Matthias Andree @ 2006-01-23 16:54 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Linux-Kernel mailing list, mtk-manpages

On Mon, 23 Jan 2006, Arjan van de Ven wrote:

> `
> > 
> > 1. What is the reason we're having special treatment
> >    for the super-user here?
> 
> it's quite common to allow root (or more specific, the right capability)
> to override rlimits. Many such security check behave that way so it's
> only "just" to treat this one like that as well.

Why is RLIMIT_MEMLOCK special enough to warrant special treatment like
this? The right capability should be able to override with setrlimit(2)
anyways, right?

> > 2. Why is it the opposite of what 2.6.8.1 and earlier did?
> 
> the earlier behavior didn't really make sense, and gave cause to
> multimedia apps running as root only to be able to mlock etc etc. Now
> this can be dynamically controlled instead.

Quoting the manpage: "In Linux kernels before 2.6.9, this limit
controlled the amount of memory that could be locked by a privileged
process."

This is nonsense, and it appears as though 2.6.8 and earlier didn't
apply the limit to unprivileged processes. Should the behavior stay as
inconsistent as it's now, I'd suggest to reword this to "...before
2.6.9, this limit controlled the amount of memory that could be locked
by /any/ process." or something even better if someone can think of
such. (manpages maintainer Cc'd)

> > 4. Is the default hard limit of 32 kB initialized by the kernel or
> 
> the kernel has a relatively low default. The reason is simple: allow too
> much mlock and the user can DoS the machine too easy. The kernel default
> should be safe, the admin / distro can very easily override anyway.

This doesn't appear to happen for SUSE 10.0, which causes trouble with
some of the "multimedia apps" BTW... apparently the limit was lowered at
the same time as the root restrictions were relaxed.

Such changes in behavior aren't adequate for 2.6.X, there are way too
many applications that can't be bothered to check the patchlevel of the
kernel, and it's totally unintuitive to users, too. Aside from the fact
that most distros have settled on one kernel.

> You may ask: why is it not zero?

No, I'm not doing that. I rather wonder why it's so low, or whom a certain
percentage such as RAM >> 5 (that's 3.125 %) would hurt. Allowing
unlimited memory allocation while at the same time allowing only 32 kB
of mlock()ed memory seems disproportionate to me.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-23 16:54   ` Matthias Andree
@ 2006-01-23 17:00     ` Arjan van de Ven
  2006-01-23 18:01       ` Matthias Andree
  0 siblings, 1 reply; 34+ messages in thread
From: Arjan van de Ven @ 2006-01-23 17:00 UTC (permalink / raw)
  To: Matthias Andree; +Cc: Linux-Kernel mailing list, mtk-manpages

> > > 4. Is the default hard limit of 32 kB initialized by the kernel or
> > 
> > the kernel has a relatively low default. The reason is simple: allow too
> > much mlock and the user can DoS the machine too easy. The kernel default
> > should be safe, the admin / distro can very easily override anyway.
> 
> This doesn't appear to happen for SUSE 10.0, which causes trouble with
> some of the "multimedia apps" BTW... apparently the limit was lowered at
> the same time as the root restrictions were relaxed.

yes the behavior is like this

                 root                non-root
before        about half of ram      nothing
after         all of ram             by default small, increasable

> Such changes in behavior aren't adequate for 2.6.X, there are way too
> many applications that can't be bothered to check the patchlevel of the
> kernel, and it's totally unintuitive to users, too. 

there is NO fundamental change here other than a *general* relaxing.
This is important to note: Apps that could mlock before STILL can mlock.
Only apps that would depend on mlock failing with a security check, and
only those who do small portions, break now because suddenly the mlock
succeeds. Big deal... those would have broken when run as root already

> No, I'm not doing that. I rather wonder why it's so low, or whom a certain
> percentage such as RAM >> 5 (that's 3.125 %) would hurt. A

because it's generally a PER PROCESS limit, so fork 60 times and kaboom
things explode. (You can argue  you can forkbomb anyway, but that's
where the process count rlimit comes in)

> Allowing
> unlimited memory allocation while at the same time allowing only 32 kB
> of mlock()ed memory seems disproportionate to me.

it's not. Normal memory is swapable. And thus a far less rare commodity
than precious pinned down memory.

What application do you have in mind that broke by this relaxing of
rules?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-23 17:00     ` Arjan van de Ven
@ 2006-01-23 18:01       ` Matthias Andree
  2006-01-23 18:13         ` Arjan van de Ven
  0 siblings, 1 reply; 34+ messages in thread
From: Matthias Andree @ 2006-01-23 18:01 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Linux-Kernel mailing list

On Mon, 23 Jan 2006, Arjan van de Ven wrote:

> yes the behavior is like this
> 
>                  root                non-root
> before        about half of ram      nothing
> after         all of ram             by default small, increasable
> [...]
> What application do you have in mind that broke by this relaxing of
> rules?

This is not something I'd like to disclose here yet.

It is an application that calls mlockall(MCL_CURRENT|MCL_FUTURE) and
apparently copes with mlockall() returning EPERM (or doesn't even try
it) but can apparently NOT cope with valign() tripping over mmap() ==
-1/EAGAIN.

The relevant people are Bcc:d.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-23 18:01       ` Matthias Andree
@ 2006-01-23 18:13         ` Arjan van de Ven
  2006-01-23 18:55           ` Matthias Andree
  0 siblings, 1 reply; 34+ messages in thread
From: Arjan van de Ven @ 2006-01-23 18:13 UTC (permalink / raw)
  To: Matthias Andree; +Cc: Linux-Kernel mailing list

On Mon, 2006-01-23 at 19:01 +0100, Matthias Andree wrote:
> On Mon, 23 Jan 2006, Arjan van de Ven wrote:
> 
> > yes the behavior is like this
> > 
> >                  root                non-root
> > before        about half of ram      nothing
> > after         all of ram             by default small, increasable
> > [...]
> > What application do you have in mind that broke by this relaxing of
> > rules?
> 
> This is not something I'd like to disclose here yet.
> 
> It is an application that calls mlockall(MCL_CURRENT|MCL_FUTURE) and
> apparently copes with mlockall() returning EPERM 

hmm... curious that mlockall() succeeds with only a 32kb rlimit....




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-23 18:13         ` Arjan van de Ven
@ 2006-01-23 18:55           ` Matthias Andree
  2006-01-23 19:04             ` Arjan van de Ven
                               ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: Matthias Andree @ 2006-01-23 18:55 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Linux-Kernel mailing list

On Mon, 23 Jan 2006, Arjan van de Ven wrote:

> hmm... curious that mlockall() succeeds with only a 32kb rlimit....

It's quite obvious with the seteuid() shuffling behind the scenes of the
app, for the mlockall() runs with euid==0, and the later mmap() with euid!=0.

Clearly the application should do both with the same privilege or raise
the RLIMIT_MEMLOCK while running with privileges.

The question that's open is one for the libc guys: malloc(), valloc()
and others seem to use mmap() on some occasions (for some allocation
sizes) - at least malloc/malloc.c comments as of 2.3.4 suggest so -, and
if this isn't orthogonal to mlockall() and set[e]uid() calls, the glibc
is pretty deeply in trouble if the code calls mlockall(MLC_FUTURE) and
then drops privileges.

The function in question appears to be valloc() with glibc 2.3.5.

In this light, mlockall(MCL_FUTURE) is pretty useless, since there is no
way to undo MCL_FUTURE without unlocking all pages at the same time.
Particularly so for setuid apps...

I'm asking the Bcc'd gentleman to reconsider mlockall() and perhaps use
explicit mlock() instead.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-23 18:55           ` Matthias Andree
@ 2006-01-23 19:04             ` Arjan van de Ven
  2006-01-23 19:38             ` Joerg Schilling
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 34+ messages in thread
From: Arjan van de Ven @ 2006-01-23 19:04 UTC (permalink / raw)
  To: Matthias Andree; +Cc: Linux-Kernel mailing list

On Mon, 2006-01-23 at 19:55 +0100, Matthias Andree wrote:
> On Mon, 23 Jan 2006, Arjan van de Ven wrote:
> 
> > hmm... curious that mlockall() succeeds with only a 32kb rlimit....
> 
> It's quite obvious with the seteuid() shuffling behind the scenes of the
> app, for the mlockall() runs with euid==0, and the later mmap() with euid!=0.

hmm how on earth was that supposed to work at all????




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-23 18:55           ` Matthias Andree
  2006-01-23 19:04             ` Arjan van de Ven
@ 2006-01-23 19:38             ` Joerg Schilling
  2006-01-23 20:30               ` Matthias Andree
  2006-01-23 20:30               ` Lee Revell
  2006-01-23 19:57             ` Lee Revell
  2006-01-23 21:34             ` Theodore Ts'o
  3 siblings, 2 replies; 34+ messages in thread
From: Joerg Schilling @ 2006-01-23 19:38 UTC (permalink / raw)
  To: matthias.andree, arjan; +Cc: linux-kernel

Matthias Andree <matthias.andree@gmx.de> wrote:

> On Mon, 23 Jan 2006, Arjan van de Ven wrote:
>
> > hmm... curious that mlockall() succeeds with only a 32kb rlimit....
>
> It's quite obvious with the seteuid() shuffling behind the scenes of the
> app, for the mlockall() runs with euid==0, and the later mmap() with euid!=0.
>
> Clearly the application should do both with the same privilege or raise
> the RLIMIT_MEMLOCK while running with privileges.
>
> The question that's open is one for the libc guys: malloc(), valloc()
> and others seem to use mmap() on some occasions (for some allocation
> sizes) - at least malloc/malloc.c comments as of 2.3.4 suggest so -, and
> if this isn't orthogonal to mlockall() and set[e]uid() calls, the glibc
> is pretty deeply in trouble if the code calls mlockall(MLC_FUTURE) and
> then drops privileges.

If the behavior described by Matthias is true for current Linuc kernels,
then there is a clean bug that needs fixing.

If the Linux kernel is not willing to accept the contract by 
mlockall(MLC_FUTURE), then it should now accept the call at all.

In our case, the kernel did accept the call to mlockall(MLC_FUTURE), but later 
ignores this contract. This bug should be fixed.

Jörg

-- 
 EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
       js@cs.tu-berlin.de                (uni)  
       schilling@fokus.fraunhofer.de     (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-23 19:38             ` Joerg Schilling
@ 2006-01-23 20:30               ` Matthias Andree
  2006-01-23 21:23                 ` Joerg Schilling
  2006-01-24  8:52                 ` Arjan van de Ven
  2006-01-23 20:30               ` Lee Revell
  1 sibling, 2 replies; 34+ messages in thread
From: Matthias Andree @ 2006-01-23 20:30 UTC (permalink / raw)
  To: Joerg Schilling; +Cc: matthias.andree, arjan, linux-kernel

Joerg Schilling schrieb am 2006-01-23:

> Matthias Andree <matthias.andree@gmx.de> wrote:
> 
> > On Mon, 23 Jan 2006, Arjan van de Ven wrote:
> >
> > > hmm... curious that mlockall() succeeds with only a 32kb rlimit....
> >
> > It's quite obvious with the seteuid() shuffling behind the scenes of the
> > app, for the mlockall() runs with euid==0, and the later mmap() with euid!=0.
> >
> > Clearly the application should do both with the same privilege or raise
> > the RLIMIT_MEMLOCK while running with privileges.
> >
> > The question that's open is one for the libc guys: malloc(), valloc()
> > and others seem to use mmap() on some occasions (for some allocation
> > sizes) - at least malloc/malloc.c comments as of 2.3.4 suggest so -, and
> > if this isn't orthogonal to mlockall() and set[e]uid() calls, the glibc
> > is pretty deeply in trouble if the code calls mlockall(MLC_FUTURE) and
> > then drops privileges.
> 
> If the behavior described by Matthias is true for current Linuc kernels,
> then there is a clean bug that needs fixing.

Jörg elided my lines that said valloc() was the function in question.

Jörg, if we're talking about valloc(), this hasn't much to do with the
kernel, but is a library issue.

There is _no_ documentation that says valloc() or memalign() or
posix_memalign() is required to use mmap(). It works on some systems and
for some allocation sizes as a side effect of the valloc()
implementation.

And because this requirement is not specified in the relevant standards,
it is wrong to assume valloc() returns locked pages. You cannot rely on
mmap() returning locked pages after mlockall() either, because you might
be exceeding resource limits.

> If the Linux kernel is not willing to accept the contract by 
> mlockall(MLC_FUTURE), then it should now accept the call at all.

If the application wants locked pages, it either needs to call mmap()
explicitly, or use mlock() on the valloc()ed region. Even then,
allocation or mlock may fail due to resource constraints. I checked
FreeBSD 6-STABLE i386, Solaris 8 FCS SPARC and SUSE Linux 10.0 i386 on
this.

> In our case, the kernel did accept the call to mlockall(MLC_FUTURE), but later 
> ignores this contract. This bug should be fixed.

The complete story is, condensed, and with return values, for a
setuid-root application:

  geteuid() == 0;
  mlockall(MLC_CURRENT|MLC_FUTURE) == (success);
  seteuid(500) == (success);
  valloc(64512 + pagesize) == NULL (failure);

Jörg, correct me if the valloc() figure is wrong.

valloc() called mmap() internally, tried to grab 1 MB, and failed with
EAGAIN - as we were able to see from the strace.

SuSE Linux 10.0, kernel 2.6.13-15.7-default #1 Tue Nov 29 14:32:29 UTC 2005
on i686 athlon i386 GNU/Linux

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-23 20:30               ` Matthias Andree
@ 2006-01-23 21:23                 ` Joerg Schilling
  2006-01-23 22:05                   ` Matthias Andree
  2006-01-24  8:52                 ` Arjan van de Ven
  1 sibling, 1 reply; 34+ messages in thread
From: Joerg Schilling @ 2006-01-23 21:23 UTC (permalink / raw)
  To: schilling, matthias.andree; +Cc: matthias.andree, linux-kernel, arjan

Matthias Andree <matthias.andree@gmx.de> wrote:

> > If the behavior described by Matthias is true for current Linuc kernels,
> > then there is a clean bug that needs fixing.
>
> Jörg elided my lines that said valloc() was the function in question.
>
> Jörg, if we're talking about valloc(), this hasn't much to do with the
> kernel, but is a library issue.

>From my understanding, the problem is that Linux first grants the 
mlockall(MLC_FUTURE) call and later ignores this contract.

The fact that valloc() works in a way that is not comprehensible
seems to be another issue. Libscg calls valloc(size) where size is less than
64 KB. From the strace output from Matthias, it looks like valloc first calls 
brk() to extend the size of the data segment (probably to aproach the next
pagesize aligned border) and later calls mmap() to get 1 MB or memory.
Well first it seems that valloc() tries to get too much memory but this 
is another story.

Inside the kernel handler for this call, the permission to lock the new 
memory _again_ checks for permission and this is wrong as the request
for locking all future pages of the process already has been granted.

This looks similar to when I open() a file that may only be opened as root
and late switch my uid to some other id. If read() would be implemented the 
same way as Linux implements the locking, each read() call would again check
whether the current uid would have permission to get access to the fd from a 
filename. This is obviously wrong. The _process_ has been granted the rights 
to mlock all future pages and this is something that needs to be nonored until 
the process dies.

> There is _no_ documentation that says valloc() or memalign() or
> posix_memalign() is required to use mmap(). It works on some systems and
> for some allocation sizes as a side effect of the valloc()
> implementation.

The problem seems to be independend how valloc() is implemented.

> And because this requirement is not specified in the relevant standards,
> it is wrong to assume valloc() returns locked pages. You cannot rely on
> mmap() returning locked pages after mlockall() either, because you might
> be exceeding resource limits.

If there were such resource limits, then they would need to be honored
regardless of the privileges of the process.

> > If the Linux kernel is not willing to accept the contract by 
> > mlockall(MLC_FUTURE), then it should now accept the call at all.
>
> If the application wants locked pages, it either needs to call mmap()
> explicitly, or use mlock() on the valloc()ed region. Even then,
> allocation or mlock may fail due to resource constraints. I checked
> FreeBSD 6-STABLE i386, Solaris 8 FCS SPARC and SUSE Linux 10.0 i386 on
> this.

What did you check?

Solaris does not check for any privileges whan calling mmap()

Solaris implements mlockall() via memcntl which contains the only 
place where a check for secpolicy_lock_memory(CRED()) takes place.

> > In our case, the kernel did accept the call to mlockall(MLC_FUTURE), but later 
> > ignores this contract. This bug should be fixed.
>
> The complete story is, condensed, and with return values, for a
> setuid-root application:
>
>   geteuid() == 0;
>   mlockall(MLC_CURRENT|MLC_FUTURE) == (success);
>   seteuid(500) == (success);
>   valloc(64512 + pagesize) == NULL (failure);
>
> Jörg, correct me if the valloc() figure is wrong.
>
> valloc() called mmap() internally, tried to grab 1 MB, and failed with
> EAGAIN - as we were able to see from the strace.

This is correct.

Returning EAGAIN seems to be a result of missunderstanding the POSIX
standard. The POSIX standard means real hardware resources when talking about

EAGAIN] 
        [ML]  The mapping could not be locked in memory, if required by 
	mlockall(), due to a lack of resources.  

If linux likes to ass a new RLIMIT_MEMLOCK resource, it would be needed to 
honor this resource independent from the user id in order to prevent being 
contradictory.

Jörg

-- 
 EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
       js@cs.tu-berlin.de                (uni)  
       schilling@fokus.fraunhofer.de     (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-23 21:23                 ` Joerg Schilling
@ 2006-01-23 22:05                   ` Matthias Andree
  0 siblings, 0 replies; 34+ messages in thread
From: Matthias Andree @ 2006-01-23 22:05 UTC (permalink / raw)
  To: Joerg Schilling; +Cc: matthias.andree, linux-kernel, arjan

Joerg Schilling schrieb am 2006-01-23:

> Matthias Andree <matthias.andree@gmx.de> wrote:
> 
> > > If the behavior described by Matthias is true for current Linuc kernels,
> > > then there is a clean bug that needs fixing.
> >
> > Jörg elided my lines that said valloc() was the function in question.
> >
> > Jörg, if we're talking about valloc(), this hasn't much to do with the
> > kernel, but is a library issue.
> 
> From my understanding, the problem is that Linux first grants the 
> mlockall(MLC_FUTURE) call and later ignores this contract.
...
> Inside the kernel handler for this call, the permission to lock the new 
> memory _again_ checks for permission and this is wrong as the request
> for locking all future pages of the process already has been granted.

I *do* think that the kernel refused our mmap() request on grounds of
the RLIMIT_MEMLOCK (32 kB) and not any other reason, because running the
same allocation code as root succeeds, and Linux 2.6.13 is documented to
ignore RLIMIT_MEMLOCK for the super-user.

And I do believe Linux is entirely on IEEE Std 1003.1-2001 grounds here.

> > There is _no_ documentation that says valloc() or memalign() or
> > posix_memalign() is required to use mmap(). It works on some systems and
> > for some allocation sizes as a side effect of the valloc()
> > implementation.
> 
> The problem seems to be independend how valloc() is implemented.

As far as the kernel is concerned, yes.

As far as your application is concerned, valloc() does not provide
"mapped" or "locked" pages, but "allocated".

> > And because this requirement is not specified in the relevant standards,
> > it is wrong to assume valloc() returns locked pages. You cannot rely on
> > mmap() returning locked pages after mlockall() either, because you might
> > be exceeding resource limits.
> 
> If there were such resource limits, then they would need to be honored
> regardless of the privileges of the process.

That's a different story.

> > > If the Linux kernel is not willing to accept the contract by 
> > > mlockall(MLC_FUTURE), then it should now accept the call at all.
> >
> > If the application wants locked pages, it either needs to call mmap()
> > explicitly, or use mlock() on the valloc()ed region. Even then,
> > allocation or mlock may fail due to resource constraints. I checked
> > FreeBSD 6-STABLE i386, Solaris 8 FCS SPARC and SUSE Linux 10.0 i386 on
> > this.
> 
> What did you check?

The mlockall() documentation. Any OS allows later mappings to fail if
they cannot be locked, and this is what happens.

The only troublesome spot that remains is valloc() using mmap()
internally, which inherits the mlockall()/mmap() failure modes and
causes bogus "out of memory" returns by valloc().

1. valloc is not required to lock pages
2. yet it can fail if it cannot lock pages

This is a problem from the applications POV, albeit one that is in
glibc's memory allocator.

mlockall() does NOT make promises HOW MUCH memory may be allocated in
the future, and that is the problem at hand. Linux allows us 32 kB (as
unprivileged user even, we don't get that with Solaris or FreeBSD!), but
we want 63 kB and Linux says "Sorry, you can't have that. EAGAIN"

> Returning EAGAIN seems to be a result of missunderstanding the POSIX
> standard. The POSIX standard means real hardware resources when talking about

Well... mlockall() allows for, "other implementation-defined limit[s]",
so POSIX is not supportive of your argument here.

> EAGAIN] 
>         [ML]  The mapping could not be locked in memory, if required by 
> 	mlockall(), due to a lack of resources.  
> 
> If linux likes to ass a new RLIMIT_MEMLOCK resource, it would be needed to 
> honor this resource independent from the user id in order to prevent being 
> contradictory.

This is irrelevant to cdrecord, because it does not trip over this
contradiction.

If I were the cdrecord maintainer, I'd forget about mlockall()
altogether because it's just too broad and doesn't allow something like
"no more auto locking" without unlocking all locked pages (see also Lee
Revell's earlier post), lock the FIFO, command data buffers and
everything explicitly through mlock(), set the scheduler, open the
device and then call setuid() to get rid of the saved set-user-id as
well. This may be narrow-minded, but given mlock() is present in the BSD
world (FreeBSD, NetBSD), in the SysV world (Solaris) and Linux, there's
reason to support it, as these constitute a large user base.

If anything then still fails (command filter), I'd ask the kernel guys
how the restriction can be lifted so that cdrecord can work without ANY
root privileges, in the most portable way.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-23 20:30               ` Matthias Andree
  2006-01-23 21:23                 ` Joerg Schilling
@ 2006-01-24  8:52                 ` Arjan van de Ven
  2006-01-24  9:08                   ` Joerg Schilling
  1 sibling, 1 reply; 34+ messages in thread
From: Arjan van de Ven @ 2006-01-24  8:52 UTC (permalink / raw)
  To: Matthias Andree; +Cc: Joerg Schilling, linux-kernel

> c() was the function in question.
> 
> Jörg, if we're talking about valloc(), this hasn't much to do with the
> kernel, but is a library issue.
> 
> There is _no_ documentation that says valloc() or memalign() or
> posix_memalign() is required to use mmap(). It works on some systems and
> for some allocation sizes as a side effect of the valloc()
> implementation.

it doesn't matter. Regardless of the method, the memory has to be locked
due to the FUTURE requirement.



> And because this requirement is not specified in the relevant standards,
> it is wrong to assume valloc() returns locked pages. 

is it? I sort of doubt that (but I'm not a standards expert, but I'd
expect that "lock all in the future" applies to all memory, not just
mmap'd memory

> You cannot rely on
> mmap() returning locked pages after mlockall() either, because you might
> be exceeding resource limits.

this is true and fully correct



the situation is messy; I can see some value in the hack Ted proposed to
just bump the rlimit automatically at an mlockall-done-by-root.. but to
be fair it's a hack :(



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-24  8:52                 ` Arjan van de Ven
@ 2006-01-24  9:08                   ` Joerg Schilling
  2006-01-24  9:15                     ` Arjan van de Ven
  2006-01-24 10:51                     ` Matthias Andree
  0 siblings, 2 replies; 34+ messages in thread
From: Joerg Schilling @ 2006-01-24  9:08 UTC (permalink / raw)
  To: matthias.andree, arjan; +Cc: schilling, linux-kernel

Arjan van de Ven <arjan@infradead.org> wrote:

> > And because this requirement is not specified in the relevant standards,
> > it is wrong to assume valloc() returns locked pages. 
>
> is it? I sort of doubt that (but I'm not a standards expert, but I'd
> expect that "lock all in the future" applies to all memory, not just
> mmap'd memory

I concur:

Locking pages into core is a property/duty of the VM subsystem.
If you have an orthogonal VM subsystem, you cannot later tell how a page was 
mapped into the user's address space. Even more: you may map a file to a 
alocation in the data segment of the proces (that has been retrieved via 
malloc()/brk()) and replace the related mapping with a mapped file.

On Solaris, there is no difference.

>
> > You cannot rely on
> > mmap() returning locked pages after mlockall() either, because you might
> > be exceeding resource limits.
>
> this is true and fully correct
>
>
>
> the situation is messy; I can see some value in the hack Ted proposed to
> just bump the rlimit automatically at an mlockall-done-by-root.. but to
> be fair it's a hack :(

As all other rlimits are honored even if you are root, it looks not orthogonal 
to disregard an existing RLIMIT_MEMLOCK rlimit if you are root.

Jörg

-- 
 EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
       js@cs.tu-berlin.de                (uni)  
       schilling@fokus.fraunhofer.de     (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-24  9:08                   ` Joerg Schilling
@ 2006-01-24  9:15                     ` Arjan van de Ven
  2006-01-24  9:18                       ` Joerg Schilling
  2006-01-24 21:28                       ` Theodore Ts'o
  2006-01-24 10:51                     ` Matthias Andree
  1 sibling, 2 replies; 34+ messages in thread
From: Arjan van de Ven @ 2006-01-24  9:15 UTC (permalink / raw)
  To: Joerg Schilling; +Cc: matthias.andree, linux-kernel

On Tue, 2006-01-24 at 10:08 +0100, Joerg Schilling wrote:
> > the situation is messy; I can see some value in the hack Ted proposed to
> > just bump the rlimit automatically at an mlockall-done-by-root.. but to
> > be fair it's a hack :(
> 
> As all other rlimits are honored even if you are root, it looks not orthogonal 
> to disregard an existing RLIMIT_MEMLOCK rlimit if you are root.

that's another solution; give root a higher rlimit by default for this.
It's also a bit messy, but a not-unreasonable default behavior.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-24  9:15                     ` Arjan van de Ven
@ 2006-01-24  9:18                       ` Joerg Schilling
  2006-01-24 21:28                       ` Theodore Ts'o
  1 sibling, 0 replies; 34+ messages in thread
From: Joerg Schilling @ 2006-01-24  9:18 UTC (permalink / raw)
  To: schilling, arjan; +Cc: matthias.andree, linux-kernel

Arjan van de Ven <arjan@infradead.org> wrote:

> On Tue, 2006-01-24 at 10:08 +0100, Joerg Schilling wrote:
> > > the situation is messy; I can see some value in the hack Ted proposed to
> > > just bump the rlimit automatically at an mlockall-done-by-root.. but to
> > > be fair it's a hack :(
> > 
> > As all other rlimits are honored even if you are root, it looks not orthogonal 
> > to disregard an existing RLIMIT_MEMLOCK rlimit if you are root.
>
> that's another solution; give root a higher rlimit by default for this.
> It's also a bit messy, but a not-unreasonable default behavior.

This would only make sense in case that you bump up the limit for processes
that are suid root and do not lower it in case someone calls seteuid().

Jörg

-- 
 EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
       js@cs.tu-berlin.de                (uni)  
       schilling@fokus.fraunhofer.de     (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-24  9:15                     ` Arjan van de Ven
  2006-01-24  9:18                       ` Joerg Schilling
@ 2006-01-24 21:28                       ` Theodore Ts'o
  2006-01-24 23:19                         ` Edgar Toernig
                                           ` (2 more replies)
  1 sibling, 3 replies; 34+ messages in thread
From: Theodore Ts'o @ 2006-01-24 21:28 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Joerg Schilling, matthias.andree, linux-kernel

On Tue, Jan 24, 2006 at 10:15:40AM +0100, Arjan van de Ven wrote:
> On Tue, 2006-01-24 at 10:08 +0100, Joerg Schilling wrote:
> > > the situation is messy; I can see some value in the hack Ted proposed to
> > > just bump the rlimit automatically at an mlockall-done-by-root.. but to
> > > be fair it's a hack :(
> > 
> > As all other rlimits are honored even if you are root, it looks not orthogonal 
> > to disregard an existing RLIMIT_MEMLOCK rlimit if you are root.
> 
> that's another solution; give root a higher rlimit by default for this.
> It's also a bit messy, but a not-unreasonable default behavior.

I thought in the case we were talking about, the problem is that we
have a setuid program which calls mlockall() but then later drops its
privileges.  So when it tries to allocate memories, RLIMIT_MEMLOCK
applies again, and so all future memory allocations would fail.  

What I proposed is a hack, but strictly speaking not necessary
according to the POSIX standards, but the problem is that a portable
program can't be expected to know that Linux has a RLIMIT_MEMLOCK
resource limit, such that a program which calls mlockall() and then
drops privileges will work under Solaris and fail under Linux.  Hence
I why proposed a hack where mlockall() would adjust RLIMIT_MEMLOCK.
Yes, no question it's a hack and a special case; the question is
whether cure or the disease is worse.

						- Ted

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-24 21:28                       ` Theodore Ts'o
@ 2006-01-24 23:19                         ` Edgar Toernig
  2006-01-25 15:38                           ` Joerg Schilling
  2006-01-24 23:26                         ` Matthias Andree
  2006-01-25 15:33                         ` Joerg Schilling
  2 siblings, 1 reply; 34+ messages in thread
From: Edgar Toernig @ 2006-01-24 23:19 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Arjan van de Ven, Joerg Schilling, matthias.andree, linux-kernel

Theodore Ts'o wrote:
>
> ... proposed a hack where mlockall() would adjust RLIMIT_MEMLOCK.
> Yes, no question it's a hack and a special case; the question is
> whether cure or the disease is worse.

What about exec?  The memory locks are removed on exec but with that
hack the raised limit would stay.  Looks like a security bug.

Ciao, ET.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-24 23:19                         ` Edgar Toernig
@ 2006-01-25 15:38                           ` Joerg Schilling
  0 siblings, 0 replies; 34+ messages in thread
From: Joerg Schilling @ 2006-01-25 15:38 UTC (permalink / raw)
  To: tytso, froese; +Cc: schilling, matthias.andree, linux-kernel, arjan

Edgar Toernig <froese@gmx.de> wrote:

> Theodore Ts'o wrote:
> >
> > ... proposed a hack where mlockall() would adjust RLIMIT_MEMLOCK.
> > Yes, no question it's a hack and a special case; the question is
> > whether cure or the disease is worse.
>
> What about exec?  The memory locks are removed on exec but with that
> hack the raised limit would stay.  Looks like a security bug.

The RLIMIT_MEMLOCK feature itself may be a security bug implemented the way it 
currentlyy is.

For me it would make sense to be able to lock everything in core and then
be able to tell the system that at most 1MB of additional memory may be locked.

In this case, there should be no general failure but the possibility to
verify that the value is sufficient for usual cases.

Jörg

-- 
 EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
       js@cs.tu-berlin.de                (uni)  
       schilling@fokus.fraunhofer.de     (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-24 21:28                       ` Theodore Ts'o
  2006-01-24 23:19                         ` Edgar Toernig
@ 2006-01-24 23:26                         ` Matthias Andree
  2006-01-24 23:27                           ` Matthias Andree
  2006-01-25 15:33                         ` Joerg Schilling
  2 siblings, 1 reply; 34+ messages in thread
From: Matthias Andree @ 2006-01-24 23:26 UTC (permalink / raw)
  To: Theodore Ts'o, Arjan van de Ven, Joerg Schilling,
	matthias.andree, linux-kernel

Theodore Ts'o schrieb am 2006-01-24:

> I thought in the case we were talking about, the problem is that we
> have a setuid program which calls mlockall() but then later drops its
> privileges.  So when it tries to allocate memories, RLIMIT_MEMLOCK
> applies again, and so all future memory allocations would fail.  

That's the coarse view. In fact, the application does not call setuid()
at this time, but only seteuid(), so it can regain privileges later, and
will in fact do that.

The application in question does this:

(root here)
1 mlockall()
2 seteuid(500);  /* park privileges for a moment */
3 valloc(63 kB); /* fails since 2.6.9's tight MEMLOCK limit */

The first patch I suggested for the application exchanged steps #2 and
#3 and works, but is not acceptable to Jörg. We haven't talked about the
reasons.

The idea behind my patch was this: if it wants the memory locked (which
is a privileged operation on many systems anyways), then why not
allocate as root? Would this hurt portability to any other system? I
don't think so. Is such a rationale unreasonable in itself? Not either.

Further patch suggestions negotiated forth and back on raising the limit
and to what value.

The other problem is that glibc 2.3.5 is part of the story, but
off-topic here, because glibc is the link between valloc() (application
side) and the mmap() (kernel side).

> What I proposed is a hack, [and] strictly speaking not necessary
> according to the POSIX standards, but the problem is that a portable
> program can't be expected to know that Linux has a RLIMIT_MEMLOCK
> resource limit, such that a program which calls mlockall() and then
> drops privileges will work under Solaris and fail under Linux.  Hence
> I why proposed a hack where mlockall() would adjust RLIMIT_MEMLOCK.
> Yes, no question it's a hack and a special case; the question is
> whether cure or the disease is worse.

Is the KERNEL the right place to implement policy such as setting
locked-page limits to 32 kB?

What if the limit were RLIM_INFINITY for root processes instead of
hacking mlockall() and the resource checks?

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-24 23:26                         ` Matthias Andree
@ 2006-01-24 23:27                           ` Matthias Andree
  0 siblings, 0 replies; 34+ messages in thread
From: Matthias Andree @ 2006-01-24 23:27 UTC (permalink / raw)
  To: Theodore Ts'o, Arjan van de Ven, Joerg Schilling, linux-kernel

Matthias Andree schrieb am 2006-01-25:

> What if the limit were RLIM_INFINITY for root processes instead of
> hacking mlockall() and the resource checks?

OK, reading Edgar's hint, the answer is "It's a bad idea."

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-24 21:28                       ` Theodore Ts'o
  2006-01-24 23:19                         ` Edgar Toernig
  2006-01-24 23:26                         ` Matthias Andree
@ 2006-01-25 15:33                         ` Joerg Schilling
  2006-01-25 16:01                           ` Matthias Andree
  2 siblings, 1 reply; 34+ messages in thread
From: Joerg Schilling @ 2006-01-25 15:33 UTC (permalink / raw)
  To: tytso, arjan; +Cc: schilling, matthias.andree, linux-kernel

"Theodore Ts'o" <tytso@mit.edu> wrote:

> I thought in the case we were talking about, the problem is that we
> have a setuid program which calls mlockall() but then later drops its
> privileges.  So when it tries to allocate memories, RLIMIT_MEMLOCK
> applies again, and so all future memory allocations would fail.  
>
> What I proposed is a hack, but strictly speaking not necessary
> according to the POSIX standards, but the problem is that a portable
> program can't be expected to know that Linux has a RLIMIT_MEMLOCK
> resource limit, such that a program which calls mlockall() and then
> drops privileges will work under Solaris and fail under Linux.  Hence
> I why proposed a hack where mlockall() would adjust RLIMIT_MEMLOCK.
> Yes, no question it's a hack and a special case; the question is
> whether cure or the disease is worse.

Maybe, I should give some hints...

RLIMIT_MEMLOCK did first apear in BSD-4.4 around 1994.
The iplementation is incomplete since then and partially disabled (size check 
for mmap() in the kernel) on FreeBSD as it has been 1994 on BSD-4.4

FreeBSD currently uses a default value of RLIMIT_INFINITY for users.

I could add this piece of code to the euid == 0 part of cdrecord:

LOCAL void 
raise_memlock() 
{ 
#ifdef  RLIMIT_MEMLOCK 
        struct rlimit rlim; 
 
        rlim.rlim_cur = rlim.rlim_max = RLIM_INFINITY; 
 
        if (setrlimit(RLIMIT_MEMLOCK, &rlim) < 0) 
                errmsg("Warning: Cannot raise RLIMIT_MEMLOCK limits."); 
#endif  /* RLIMIT_NOFILE */ 
} 

Jörg

-- 
 EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
       js@cs.tu-berlin.de                (uni)  
       schilling@fokus.fraunhofer.de     (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-25 15:33                         ` Joerg Schilling
@ 2006-01-25 16:01                           ` Matthias Andree
  0 siblings, 0 replies; 34+ messages in thread
From: Matthias Andree @ 2006-01-25 16:01 UTC (permalink / raw)
  To: Joerg Schilling; +Cc: tytso, arjan, linux-kernel

Joerg Schilling wrote:

> RLIMIT_MEMLOCK did first apear in BSD-4.4 around 1994.
> The iplementation is incomplete since then and partially disabled (size check 
> for mmap() in the kernel) on FreeBSD as it has been 1994 on BSD-4.4
> 
> FreeBSD currently uses a default value of RLIMIT_INFINITY for users.

And while it does that (or in fact, rather not distinguish between root and
unprivileged users), mlock() and mlockall() are privileged operations on
FreeBSD.

> I could add this piece of code to the euid == 0 part of cdrecord:
> 
> LOCAL void 
> raise_memlock() 
> { 
> #ifdef  RLIMIT_MEMLOCK 
>         struct rlimit rlim; 
>  
>         rlim.rlim_cur = rlim.rlim_max = RLIM_INFINITY; 
>  
>         if (setrlimit(RLIMIT_MEMLOCK, &rlim) < 0) 
>                 errmsg("Warning: Cannot raise RLIMIT_MEMLOCK limits."); 
> #endif  /* RLIMIT_NOFILE */ 
> } 

Except that your new #endif comment is wrong, that is exactly what I
suggested and what I've tried and found working.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-24  9:08                   ` Joerg Schilling
  2006-01-24  9:15                     ` Arjan van de Ven
@ 2006-01-24 10:51                     ` Matthias Andree
  1 sibling, 0 replies; 34+ messages in thread
From: Matthias Andree @ 2006-01-24 10:51 UTC (permalink / raw)
  To: Joerg Schilling; +Cc: matthias.andree, arjan, linux-kernel

Joerg Schilling schrieb am 2006-01-24:

> Arjan van de Ven <arjan@infradead.org> wrote:
> 
> > > And because this requirement is not specified in the relevant standards,
> > > it is wrong to assume valloc() returns locked pages. 
> >
> > is it? I sort of doubt that (but I'm not a standards expert, but I'd
> > expect that "lock all in the future" applies to all memory, not just
> > mmap'd memory
> 
> I concur:
> 
> Locking pages into core is a property/duty of the VM subsystem.

But where is this laid down in the standard? There must be some part
that defines this, else we cannot rely on it. The wording for malloc()
and mmap() or mlock() is different. One talks about address space and
mapping, whereas malloc() talks about "storage".

Only I haven't got time to look for it now. Just that Solaris happens to
do it doesn't make it a standard.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-23 19:38             ` Joerg Schilling
  2006-01-23 20:30               ` Matthias Andree
@ 2006-01-23 20:30               ` Lee Revell
  2006-01-23 21:33                 ` Joerg Schilling
  1 sibling, 1 reply; 34+ messages in thread
From: Lee Revell @ 2006-01-23 20:30 UTC (permalink / raw)
  To: Joerg Schilling; +Cc: matthias.andree, arjan, linux-kernel

On Mon, 2006-01-23 at 20:38 +0100, Joerg Schilling wrote:
> Matthias Andree <matthias.andree@gmx.de> wrote:
> 
> > On Mon, 23 Jan 2006, Arjan van de Ven wrote:
> >
> > > hmm... curious that mlockall() succeeds with only a 32kb rlimit....
> >
> > It's quite obvious with the seteuid() shuffling behind the scenes of the
> > app, for the mlockall() runs with euid==0, and the later mmap() with euid!=0.
> >
> > Clearly the application should do both with the same privilege or raise
> > the RLIMIT_MEMLOCK while running with privileges.
> >
> > The question that's open is one for the libc guys: malloc(), valloc()
> > and others seem to use mmap() on some occasions (for some allocation
> > sizes) - at least malloc/malloc.c comments as of 2.3.4 suggest so -, and
> > if this isn't orthogonal to mlockall() and set[e]uid() calls, the glibc
> > is pretty deeply in trouble if the code calls mlockall(MLC_FUTURE) and
> > then drops privileges.
> 
> If the behavior described by Matthias is true for current Linuc kernels,
> then there is a clean bug that needs fixing.
> 
> If the Linux kernel is not willing to accept the contract by 
> mlockall(MLC_FUTURE), then it should now accept the call at all.
> 
> In our case, the kernel did accept the call to mlockall(MLC_FUTURE), but later 
> ignores this contract. This bug should be fixed.

Joerg,

You will be happy to know that in future Linux distros, cdrecord will
not require setuid to mlock() and get SCHED_FIFO - both are now
controlled by rlimits, so if the distro ships with a sane PAM/group
configuration, all you will need to do is add cdrecord users to the
"realtime" or "cdrecord" or "audio" group.

This will take a while to make it into distros as it requires changes to
PAM and glibc in addition to the kernel.

Lee


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-23 20:30               ` Lee Revell
@ 2006-01-23 21:33                 ` Joerg Schilling
  0 siblings, 0 replies; 34+ messages in thread
From: Joerg Schilling @ 2006-01-23 21:33 UTC (permalink / raw)
  To: schilling, rlrevell; +Cc: matthias.andree, linux-kernel, arjan

Lee Revell <rlrevell@joe-job.com> wrote:

> > In our case, the kernel did accept the call to mlockall(MLC_FUTURE), but later 
> > ignores this contract. This bug should be fixed.
>
> Joerg,
>
> You will be happy to know that in future Linux distros, cdrecord will
> not require setuid to mlock() and get SCHED_FIFO - both are now
> controlled by rlimits, so if the distro ships with a sane PAM/group
> configuration, all you will need to do is add cdrecord users to the
> "realtime" or "cdrecord" or "audio" group.
>
> This will take a while to make it into distros as it requires changes to
> PAM and glibc in addition to the kernel.

Well, on Solaris running cdrecord root-less is possible since 2 years.

What you do is to add a line

joerg::::profiles=CD RW

to /etc/user_attr

and a line:

CD RW:solaris:cmd:::/opt/schily/bin/cdrecord: privs=file_dac_read,sys_devices,proc_lock_memory,proc_priocntl,net_privaddr

to /etc/security/exec_attr

or to just a line

All:solaris:cmd:::/opt/schily/bin/cdrecord: privs=file_dac_read,sys_devices,proc_lock_memory,proc_priocntl,net_privaddr

to /etc/security/exec_attr

the command then is executed via /usr/vin/pfexec and gets the listed fine 
grained privileges in addition to the basic privileges.

We plan to break sys_devices into more fine grained privs that
include several levels of SCSI rights in the near future.

If Linux manages to do something similar, I would be happy.
It is obvious that this is someting that could only be used if there
is not only kernel code to support fine grained privs but there is a need
for a user space infrastructure that allows to use a seamless integration.

Jörg

-- 
 EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
       js@cs.tu-berlin.de                (uni)  
       schilling@fokus.fraunhofer.de     (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-23 18:55           ` Matthias Andree
  2006-01-23 19:04             ` Arjan van de Ven
  2006-01-23 19:38             ` Joerg Schilling
@ 2006-01-23 19:57             ` Lee Revell
  2006-01-23 21:34             ` Theodore Ts'o
  3 siblings, 0 replies; 34+ messages in thread
From: Lee Revell @ 2006-01-23 19:57 UTC (permalink / raw)
  To: Matthias Andree; +Cc: Arjan van de Ven, Linux-Kernel mailing list

On Mon, 2006-01-23 at 19:55 +0100, Matthias Andree wrote:
> I'm asking the Bcc'd gentleman to reconsider mlockall() and perhaps
> use explicit mlock() instead. 

Probably good advice, I have found mlockall() to be especially
problematic with multithreaded programs and NPTL, as glibc eats
RLIMIT_STACK of unswappable memory for each thread stack which defaults
to 8MB here - you go OOM really quick like this.  Most people don't seem
to realize the need to set a sane value with pthread_attr_setstack().

(Even when not mlock'ed, insanely huge thread stack defaults seem to
account for a lot of the visible bloat on the desktop - decreasing
RLIMIT_STACK to 512KB reduces the footprint of Gnome 2.12 by 100+ MB.)

Lee


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-23 18:55           ` Matthias Andree
                               ` (2 preceding siblings ...)
  2006-01-23 19:57             ` Lee Revell
@ 2006-01-23 21:34             ` Theodore Ts'o
  2006-01-24 11:06               ` Matthias Andree
  3 siblings, 1 reply; 34+ messages in thread
From: Theodore Ts'o @ 2006-01-23 21:34 UTC (permalink / raw)
  To: Arjan van de Ven, Linux-Kernel mailing list

On Mon, Jan 23, 2006 at 07:55:49PM +0100, Matthias Andree wrote:
> The question that's open is one for the libc guys: malloc(), valloc()
> and others seem to use mmap() on some occasions (for some allocation
> sizes) - at least malloc/malloc.c comments as of 2.3.4 suggest so -, and
> if this isn't orthogonal to mlockall() and set[e]uid() calls, the glibc
> is pretty deeply in trouble if the code calls mlockall(MLC_FUTURE) and
> then drops privileges.

Maybe mlockall(MLC_FUTURE) when run with privileges should
automatically adjust the RLIMIT_MEMLOCK resource limit?

					- Ted

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Rationale for RLIMIT_MEMLOCK?
  2006-01-23 21:34             ` Theodore Ts'o
@ 2006-01-24 11:06               ` Matthias Andree
  0 siblings, 0 replies; 34+ messages in thread
From: Matthias Andree @ 2006-01-24 11:06 UTC (permalink / raw)
  To: Theodore Ts'o, Arjan van de Ven, Linux-Kernel mailing list

On Mon, 23 Jan 2006, Theodore Ts'o wrote:

> On Mon, Jan 23, 2006 at 07:55:49PM +0100, Matthias Andree wrote:
> > The question that's open is one for the libc guys: malloc(), valloc()
> > and others seem to use mmap() on some occasions (for some allocation
> > sizes) - at least malloc/malloc.c comments as of 2.3.4 suggest so -, and
> > if this isn't orthogonal to mlockall() and set[e]uid() calls, the glibc
> > is pretty deeply in trouble if the code calls mlockall(MLC_FUTURE) and
> > then drops privileges.
> 
> Maybe mlockall(MLC_FUTURE) when run with privileges should
> automatically adjust the RLIMIT_MEMLOCK resource limit?

Adding special cases to no end.
Is this really sensible?

How about leaving RLIMIT_MEMLOCK alone (and at RLIM_INFINITY) for root
processes altogether? At least that wouldn't add a new special case but
just change the existing one to remove an inconsistency, and the effect
will be the same, only that it is inherited across seteuid().

I doubt that the kernel is the right place to implement policies that
belong into user space.  As long as the kernel is meant to be universal,
any default will collide with an application's requirement sooner or
later.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2006-02-03 20:49 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <5y7B5-1dw-15@gated-at.bofh.it>
     [not found] ` <5y7KL-1DZ-31@gated-at.bofh.it>
     [not found]   ` <5yddh-1pA-47@gated-at.bofh.it>
     [not found]     ` <5ydni-1Qq-3@gated-at.bofh.it>
     [not found]       ` <5yek1-3iP-53@gated-at.bofh.it>
     [not found]         ` <5yeth-3us-33@gated-at.bofh.it>
     [not found]           ` <5yf5O-4iF-19@gated-at.bofh.it>
     [not found]             ` <5yfI4-5kU-11@gated-at.bofh.it>
     [not found]               ` <5ygE4-6LK-35@gated-at.bofh.it>
     [not found]                 ` <5yhqg-7ZR-5@gated-at.bofh.it>
     [not found]                   ` <5yi2X-zm-7@gated-at.bofh.it>
2006-01-24  9:14                     ` CD writing in future Linux (stirring up a hornets' nest) (was: Rationale for RLIMIT_MEMLOCK?) Bodo Eggert
2006-01-24 14:38                       ` Joerg Schilling
2006-01-24 17:44                         ` CD writing in future Linux (stirring up a hornets' nest) Bodo Eggert
     [not found]               ` <5ygDT-6LK-3@gated-at.bofh.it>
     [not found]                 ` <5yscc-68j-5@gated-at.bofh.it>
     [not found]                   ` <5ysvk-6JI-5@gated-at.bofh.it>
     [not found]                     ` <5ysvk-6JI-3@gated-at.bofh.it>
     [not found]                       ` <5yEn7-7Or-21@gated-at.bofh.it>
     [not found]                         ` <5yUUI-6JR-15@gated-at.bofh.it>
2006-01-26  0:12                           ` Rationale for RLIMIT_MEMLOCK? Bodo Eggert
2006-02-03 20:49 Michael Kerrisk
  -- strict thread matches above, loose matches on Subject: below --
2006-01-23 10:56 Matthias Andree
2006-01-23 11:05 ` Arjan van de Ven
2006-01-23 16:54   ` Matthias Andree
2006-01-23 17:00     ` Arjan van de Ven
2006-01-23 18:01       ` Matthias Andree
2006-01-23 18:13         ` Arjan van de Ven
2006-01-23 18:55           ` Matthias Andree
2006-01-23 19:04             ` Arjan van de Ven
2006-01-23 19:38             ` Joerg Schilling
2006-01-23 20:30               ` Matthias Andree
2006-01-23 21:23                 ` Joerg Schilling
2006-01-23 22:05                   ` Matthias Andree
2006-01-24  8:52                 ` Arjan van de Ven
2006-01-24  9:08                   ` Joerg Schilling
2006-01-24  9:15                     ` Arjan van de Ven
2006-01-24  9:18                       ` Joerg Schilling
2006-01-24 21:28                       ` Theodore Ts'o
2006-01-24 23:19                         ` Edgar Toernig
2006-01-25 15:38                           ` Joerg Schilling
2006-01-24 23:26                         ` Matthias Andree
2006-01-24 23:27                           ` Matthias Andree
2006-01-25 15:33                         ` Joerg Schilling
2006-01-25 16:01                           ` Matthias Andree
2006-01-24 10:51                     ` Matthias Andree
2006-01-23 20:30               ` Lee Revell
2006-01-23 21:33                 ` Joerg Schilling
2006-01-23 19:57             ` Lee Revell
2006-01-23 21:34             ` Theodore Ts'o
2006-01-24 11:06               ` Matthias Andree

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).