From: Sander <sander@humilis.net>
To: cwillu <cwillu@cwillu.com>
Cc: sander@humilis.net, Daniel Poelzleithner <poelzi@poelzi.org>,
linux-btrfs@vger.kernel.org
Subject: Re: flush-btrfs-1 hangs when building openwrt
Date: Mon, 31 Jan 2011 12:40:51 +0100 [thread overview]
Message-ID: <20110131114051.GA11121@attic.humilis.net> (raw)
In-Reply-To: <AANLkTi=e2+HEo7hSN9rudjCufV23UX9rsoGPe9HY=Mjz@mail.gmail.com>
cwillu wrote (ao):
> On Mon, Jan 31, 2011 at 5:18 AM, Sander <sander@humilis.net> wrote:
> > cwillu wrote (ao):
> >> On Mon, Jan 31, 2011 at 4:52 AM, Sander <sander@humilis.net> wrote:
> >> > It started with hanging jobs on the backup disk. I stopped cron and
> >> > could kill most of the jobs. Some are still hanging though.
> >> >
> >> > Since then (uptime 12 days) I see hanging procmail processes, and an
> >> > apt-get upgrade last week gave an unkillable dpkg process. All these have
> >> > nothing to do with the backup disk. CPU is maxed out:
> >> >
> >> > top - 11:49:54 up 12 days, ?1:19, 31 users, ?load average: 13.54, 13.41, 13.36
> >> > Tasks: 201 total, ?13 running, 187 sleeping, ? 0 stopped, ? 1 zombie
> >> > Cpu(s): 41.5%us, 58.5%sy, ?0.0%ni, ?0.0%id, ?0.0%wa, ?0.0%hi, ?0.0%si, ?0.0%st
> >> > Mem: ? ?515004k total, ? 400824k used, ? 114180k free, ? ? ? 28k buffers
> >> > Swap: ?4302560k total, ? 173988k used, ?4128572k free, ? 202948k cached
> >> >
> >> > ?PID USER ? ? ?PR ?NI ?VIRT ?RES ?SHR S %CPU %MEM ? ?TIME+ ?COMMAND
> >> > ?1592 ookhoi ? ?20 ? 0 ?2716 ?456 ?348 S ?1.9 ?0.1 ?25:17.42 showNewMail2
> >> > ?6761 ookhoi ? ?20 ? 0 ?2736 1000 ?704 S ?1.3 ?0.2 ?61:21.93 top
> >> > 27609 ookhoi ? ?20 ? 0 ?2736 1264 ?936 R ?1.3 ?0.2 ? 0:01.06 top
> >> > 30678 ookhoi ? ?20 ? 0 ?2736 ?892 ?584 S ?1.3 ?0.2 ?91:37.75 top
> >> > ?6036 ookhoi ? ?39 ?19 ?2692 ? 64 ? 52 R ?1.0 ?0.0 869:46.32 procmail
> >> > 11373 ookhoi ? ?39 ?19 ?4800 ? 64 ? 52 R ?1.0 ?0.0 714:25.88 procmail
> >> > 18871 root ? ? ?39 ?19 ?2540 ? 32 ? 20 R ?1.0 ?0.0 ? 1528:51 lzop
> >> > 18894 ookhoi ? ?39 ?19 ?2692 ? 64 ? 52 R ?1.0 ?0.0 611:16.18 procmail
> >> > 20305 ookhoi ? ?39 ?19 ?2692 ? 68 ? 56 R ?1.0 ?0.0 610:51.97 procmail
> >> > 20378 ookhoi ? ?39 ?19 ?2692 ? 68 ? 56 R ?1.0 ?0.0 610:50.75 procmail
> >> > 23661 ookhoi ? ?39 ?19 ?2692 ? 80 ? 68 R ?1.0 ?0.0 ? 1308:23 procmail
> >> > 25091 root ? ? ?20 ? 0 ? ? 0 ? ?0 ? ?0 S ?1.0 ?0.0 ? 0:25.63 flush-btrfs-2
> >> > 26409 root ? ? ?39 ?19 ?2264 ? 32 ? 28 R ?1.0 ?0.0 ? 1526:42 mv
> >> > 27606 ookhoi ? ?39 ?19 ?9084 ? 40 ? 28 R ?1.0 ?0.0 ? 3637:39 procmail
> >> > 27910 root ? ? ?39 ?19 15096 3756 ?304 R ?1.0 ?0.7 638:46.62 dpkg
> >> > 11804 ookhoi ? ?39 ?19 ?4700 ? 64 ? 52 R ?0.6 ?0.0 714:08.67 procmail
> >> > ? ?3 root ? ? ?20 ? 0 ? ? 0 ? ?0 ? ?0 R ?0.3 ?0.0 ? 9:39.76 ksoftirqd/0
> >> >
> >> >
> >> > What can I do to provide more info?
> >>
> >> alt-sysrq-w, and then the dmesg output, which will contain then a
> >> backtrace for every blocked process.
> >
> > Thanks cwillu.
> >
> > Seems only two processes. And these are related to the backup disk
> > (which might or might not be broken: can't access it anymore).
> >
> > Nothing to do with the procmail and dpkg processes.
> >
> >
> > [1042949.513831] SysRq : Show Blocked State
> > [1042949.517776] ? task ? ? ? ? ? ? ? ?PC stack ? pid father
> > [1042949.523247] cat ? ? ? ? ? D c0475dd0 ? ? 0 30063 ? ? ?1 0x00000001
> > [1042949.529668] [<c0475dd0>] (schedule+0x344/0x398) from [<c04764ec>] (__mutex_lock_slowpath+0x64/0x88)
> > [1042949.538943] [<c04764ec>] (__mutex_lock_slowpath+0x64/0x88) from [<c01af0e8>] (do_lookup+0x90/0x128)
> > [1042949.548209] [<c01af0e8>] (do_lookup+0x90/0x128) from [<c01b03f4>] (do_last+0x198/0x5b8)
> > [1042949.556432] [<c01b03f4>] (do_last+0x198/0x5b8) from [<c01b20f8>] (do_filp_open+0x168/0x49c)
> > [1042949.565004] [<c01b20f8>] (do_filp_open+0x168/0x49c) from [<c01a555c>] (do_sys_open+0x58/0x11c)
> > [1042949.573838] [<c01a555c>] (do_sys_open+0x58/0x11c) from [<c0136ee0>] (ret_fast_syscall+0x0/0x2c)
> > [1042949.582750] cat ? ? ? ? ? D c0475dd0 ? ? 0 ?4591 ? ? ?1 0x00000001
> > [1042949.589152] [<c0475dd0>] (schedule+0x344/0x398) from [<c04764ec>] (__mutex_lock_slowpath+0x64/0x88)
> > [1042949.598418] [<c04764ec>] (__mutex_lock_slowpath+0x64/0x88) from [<c01af0e8>] (do_lookup+0x90/0x128)
> > [1042949.607687] [<c01af0e8>] (do_lookup+0x90/0x128) from [<c01b03f4>] (do_last+0x198/0x5b8)
> > [1042949.615910] [<c01b03f4>] (do_last+0x198/0x5b8) from [<c01b20f8>] (do_filp_open+0x168/0x49c)
> > [1042949.624482] [<c01b20f8>] (do_filp_open+0x168/0x49c) from [<c01a555c>] (do_sys_open+0x58/0x11c)
> > [1042949.633315] [<c01a555c>] (do_sys_open+0x58/0x11c) from [<c0136ee0>] (ret_fast_syscall+0x0/0x2c)
>
> dpkg and procmail were just showing up for you in top because it was
> sorting by memory usage, which isn't what we were looking for here.
It was not. The CPU numbers were low due to a 'find' which consumes a
lot now and then. This one shows better:
top - 12:32:22 up 12 days, 2:01, 32 users, load average: 13.48, 13.37, 13.39
Tasks: 199 total, 12 running, 186 sleeping, 0 stopped, 1 zombie
Cpu(s): 0.0%us, 75.4%sy, 0.0%ni, 24.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 515004k total, 366200k used, 148804k free, 28k buffers
Swap: 4302560k total, 174188k used, 4128372k free, 170124k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11804 ookhoi 39 19 4700 64 52 R 8.8 0.0 717:10.30 procmail
6036 ookhoi 39 19 2692 64 52 R 8.5 0.0 872:47.95 procmail
18871 root 39 19 2540 32 20 R 8.5 0.0 1531:53 lzop
20305 ookhoi 39 19 2692 68 56 R 8.5 0.0 613:53.59 procmail
20378 ookhoi 39 19 2692 68 56 R 8.5 0.0 613:52.37 procmail
23661 ookhoi 39 19 2692 80 68 R 8.5 0.0 1311:24 procmail
27910 root 39 19 15096 3748 304 R 8.5 0.7 641:48.25 dpkg
11373 ookhoi 39 19 4800 64 52 R 8.2 0.0 717:27.50 procmail
18894 ookhoi 39 19 2692 64 52 R 8.2 0.0 614:17.80 procmail
26409 root 39 19 2264 32 28 R 8.2 0.0 1529:44 mv
27606 ookhoi 39 19 9084 40 28 R 8.2 0.0 3640:41 procmail
11120 root 20 0 0 0 0 S 5.6 0.0 0:02.94 flush-btrfs-2
> In your case, the blocking is almost certainly due to your failing
> disk.
Also for procmail and dpkg? Which do not operate on the disk that seems
to fail, and is located under /holding/ ?
Anyway, I'll reboot the machine this afternoon with the suspect disk
removed.
Thanks again for your reply cwillu.
Sander
--
Humilis IT Services and Solutions
http://www.humilis.net
prev parent reply other threads:[~2011-01-31 11:40 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-25 8:51 flush-btrfs-1 hangs when building openwrt Daniel Poelzleithner
2011-01-25 15:30 ` Josef Bacik
2011-01-25 21:16 ` Daniel Poelzleithner
2011-01-31 10:52 ` Sander
2011-01-31 11:08 ` cwillu
2011-01-31 11:18 ` Sander
2011-01-31 11:26 ` cwillu
2011-01-31 11:40 ` Sander [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110131114051.GA11121@attic.humilis.net \
--to=sander@humilis.net \
--cc=cwillu@cwillu.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=poelzi@poelzi.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.