All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Berthels <john@humyo.com>
To: linux-kernel@vger.kernel.org
Cc: Nick Gregory <nick@humyo.com>, Rob Sanderson <rob@humyo.com>
Subject: PROBLEM + POSS FIX: kernel stack overflow, xfs, many disks, heavy write load, 8k stack, x86-64
Date: Wed, 07 Apr 2010 12:06:01 +0100	[thread overview]
Message-ID: <4BBC6719.7080304@humyo.com> (raw)

Hi folks,

[I'm afraid that I'm not subscribed to the list, please cc: me on any 
reply].

Problem: kernel.org 2.6.33.2 x86_64 kernel locks up under write-heavy 
I/O load. It is "fixed" by changing THREAD_ORDER to 2.

Is this an OK long-term solution/should this be needed? As far as I can 
see from searching, there is an expectation that xfs would generally 
work with 8k stacks (THREAD_ORDER 1). We don't have xfs stacked over LVM 
or anything else.

If anyone can offer any advice on this, that would be great. I 
understand larger kernel stacks may introduce problems in getting an 
allocation of the appropriate size. So am I right in thinking the 
symptom we need to look out for would be an error on fork() or clone()? 
Or will the box panic in that case?

Details below.

regards,

jb


Background: We have a cluster of systems with roughly the following 
specs (2GB RAM, 24 (twenty-four) 1TB+ disks, Intel Core2 Duo @ 2.2GHz).

Following a the addition of three new servers to the cluster, we started 
seeing a high incidence of intermittent lockups (up to several times per 
day for some servers) across both the old and new servers. Prior to 
that, we saw this problem only rarely (perhaps once per 3 months).

Adding the new servers will have changed the I/O patterns to all 
servers. The servers receive a heavy write load, often with many slow 
writers (as well as a read load).

Servers would become unresponsive, with nothing written to 
/var/log/messages. Setting sysctl kernel.panic=300 caused a restart 
(which showed the kernel was panicing and unable to write at the time). 
netconsole showed a variety of stack traces, mostly related to xfs_write 
activity (but then, that's what the box spends it's time doing).

22/24 of the disks have 1 partition, formatted with xfs (over the 
partition, not over LVM). The other 2 disks have 3 partitions: xfs data, 
swap and a RAID1 partition contributing to an ext3 root filesystem 
mounted on /dev/md0.

We have tried various solutions (different kernels from ubuntu server 
2.6.28->2.6.32).

Vanilla 2.6.33.2 from kernel.org + stack tracing still has the problem, 
and logged:

kernel: [58552.740032] flush-8:112 used greatest stack depth: 184 bytes left

a short while before dying.

Vanilla 2.6.33.2 + stack tracing + THREAD_ORDER 2 is much more stable 
(no lockups so far, we would have expected 5-6 by now) and has logged:

kernel: [44798.183507] apache2 used greatest stack depth: 7208 bytes left

which I understand (possibly wrongly) as concrete evidence that we have 
exceeded 8k of stack space.


             reply	other threads:[~2010-04-07 11:13 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-07 11:06 John Berthels [this message]
2010-04-07 14:05 ` PROBLEM + POSS FIX: kernel stack overflow, xfs, many disks, heavy write load, 8k stack, x86-64 Dave Chinner
2010-04-07 14:05   ` Dave Chinner
2010-04-07 15:57   ` John Berthels
2010-04-07 15:57     ` John Berthels
2010-04-07 17:43     ` Eric Sandeen
2010-04-07 17:43       ` Eric Sandeen
2010-04-07 23:43     ` Dave Chinner
2010-04-07 23:43       ` Dave Chinner
2010-04-08  3:03       ` Dave Chinner
2010-04-08  3:03         ` Dave Chinner
2010-04-08  3:03         ` Dave Chinner
2010-04-08 12:16         ` John Berthels
2010-04-08 12:16           ` John Berthels
2010-04-08 12:16           ` John Berthels
2010-04-08 14:47           ` John Berthels
2010-04-08 14:47             ` John Berthels
2010-04-08 14:47             ` John Berthels
2010-04-08 16:18             ` John Berthels
2010-04-08 16:18               ` John Berthels
2010-04-08 16:18               ` John Berthels
2010-04-08 23:38             ` Dave Chinner
2010-04-08 23:38               ` Dave Chinner
2010-04-08 23:38               ` Dave Chinner
2010-04-09 11:38               ` Chris Mason
2010-04-09 11:38                 ` Chris Mason
2010-04-09 11:38                 ` Chris Mason
2010-04-09 18:05                 ` Eric Sandeen
2010-04-09 18:05                   ` Eric Sandeen
2010-04-09 18:05                   ` Eric Sandeen
2010-04-09 18:11                   ` Chris Mason
2010-04-09 18:11                     ` Chris Mason
2010-04-09 18:11                     ` Chris Mason
2010-04-12  1:01                     ` Dave Chinner
2010-04-12  1:01                       ` Dave Chinner
2010-04-12  1:01                       ` Dave Chinner
2010-04-13  9:51                 ` John Berthels
2010-04-13  9:51                   ` John Berthels
2010-04-16 13:41                 ` John Berthels
2010-04-16 13:41                   ` John Berthels
2010-04-16 13:41                   ` John Berthels
2010-04-09 13:43               ` John Berthels
2010-04-09 13:43                 ` John Berthels

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BBC6719.7080304@humyo.com \
    --to=john@humyo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nick@humyo.com \
    --cc=rob@humyo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.