linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.28, rlimits, performance and debian etch
@ 2009-01-21 11:52 Peter Palfrader
  2009-01-23 21:07 ` Florian Weimer
  2009-01-27 23:17 ` Andrew Morton
  0 siblings, 2 replies; 14+ messages in thread
From: Peter Palfrader @ 2009-01-21 11:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: DSA, team, libpam-modules

Hi,

I spent several hours trying to get to the bottom of a serious
performance issue that appeared on one of our servers after upgrading to
2.6.28.  In the end it's what could be considered a userspace bug that
was triggered by a change in 2.6.28.  Since this might also affect other
people I figured I'd at least document what I found here, and maybe we
can even do something about it:


So, I upgraded some of debian.org's machines to 2.6.28.1 and immediately
the team maintaining our ftp archive complained that one of their
scripts that previously ran in a few minutes still hadn't even come
close to being done after an hour or so.  Downgrading to 2.6.27 fixed
that.

Turns out that script is forking a lot and something in it or python or
whereever closes all the file descriptors it doesn't want to pass on.
That is, it starts at zero and goes up to ulimit -n/RLIMIT_NOFILE and
closes them all with a few exceptions.

Turns out that takes a long time when your limit -n is now 2^20 (1048576).

With 2.6.27.* the ulimit -n was the standard 1024, but with 2.6.28 it is
now a thousand times that.

2.6.28 included a patch titled "rlimit: permit setting RLIMIT_NOFILE to
RLIM_INFINITY" (0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f)[1] that
allows, as the title implies, to set the limit for number of files to
infinity.

Closer investigation showed that the broken default ulimit did not apply
to "system" processes (like stuff started from init).  In the end I
could establish that all processes that passed through pam_limit at one
point had the bad resource limit.

Apparently the pam library in Debian etch (4.0) initializes the limits
to some default values when it doesn't have any settings in limit.conf
to override them.  Turns out that for nofiles this is RLIM_INFINITY.
Commenting out "case RLIMIT_NOFILE" in pam_limit.c:267 of our pam
package version 0.79-5 fixes that - tho I'm not sure what side effects
that has.

Debian lenny (the upcoming 5.0 version) doesn't have this issue as it
uses a different pam (version).


I'm a bit unsure where to go from here.  Maybe the pam library in etch
should be fixed.  Maybe the patch should be reverted (but then it may be
more correct now and that's what the changelog entry suggests).
As a stopgap measure I could also just define nofile in limits.conf.

Thanks for listening.  Also thanks to Rik and Nocholas who helped track
some of this down.

Cheers,
Peter
1. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f
   http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f
-- 
                           |  .''`.  ** Debian GNU/Linux **
      Peter Palfrader      | : :' :      The  universal
 http://www.palfrader.org/ | `. `'      Operating System
                           |   `-    http://www.debian.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: 2.6.28, rlimits, performance and debian etch
@ 2009-02-26 21:48 Frans Pop
  2009-02-26 22:01 ` Steve Langasek
  2009-02-27  7:30 ` Peter Palfrader
  0 siblings, 2 replies; 14+ messages in thread
From: Frans Pop @ 2009-02-26 21:48 UTC (permalink / raw)
  To: linux-kernel; +Cc: libpam-modules, debian-admin

Steve Langasek wrote:
> It has already been mentioned that this does not apply to the upcoming
> Debian 5.0 release (lenny); this patch is only present in the 4.0
> release (etch), it was actually fixed in the development series to not
> use RLIM_INFINITY *because* previous kernels didn't support this and
> would cause pam_limits to throw log warnings.

I've just migrated my home servers from Debian etch to lenny and bind9 now 
gives me this:

named[17207]: max open files (1024) is smaller than max sockets (4096)

Redhat's BTS [1] tells me this is a kernel issue thatshould be solved [2] 
by the "rlimit: permit setting RLIMIT_NOFILE to RLIM_INFINITY" patch in 
2.6.28.

One of my servers (DNS slave) is running the standard Debian 2.6.26 
kernel, the other (DNS master) is running 2.6.29-rc6 so that does include 
that patch. But both show the error!

I'd appreciate your input where to take this.

Cheers,
FJP

[1] https://bugzilla.redhat.com/show_bug.cgi?id=477540
[2] https://bugzilla.redhat.com/show_bug.cgi?id=461458

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-02-27  7:30 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-21 11:52 2.6.28, rlimits, performance and debian etch Peter Palfrader
2009-01-23 21:07 ` Florian Weimer
2009-01-23 22:02   ` David Daney
2009-01-23 23:11     ` Peter Palfrader
2009-01-25 10:59     ` Florian Weimer
2009-01-27 23:17 ` Andrew Morton
2009-01-29 12:19   ` Adam Tkac
2009-01-29 18:05     ` Andrew Morton
2009-01-29 18:10       ` Peter Palfrader
2009-02-02 16:20       ` Adam Tkac
2009-02-08 22:31     ` Steve Langasek
2009-02-26 21:48 Frans Pop
2009-02-26 22:01 ` Steve Langasek
2009-02-27  7:30 ` Peter Palfrader

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).